From 2259a74c14c1a843dd45da585b50b43c4b9f9e29 Mon Sep 17 00:00:00 2001
From: Varun Gandhi <varun.gandhi@sourcegraph.com>
Date: Mon, 21 Oct 2024 18:15:52 +0800
Subject: [PATCH 1/2] docs: Update for new auto-indexing limits

---
 .../inference_configuration.mdx               | 36 +++++++++++++++++++
 1 file changed, 36 insertions(+)
diff --git a/docs/code-search/code-navigation/inference_configuration.mdx b/docs/code-search/code-navigation/inference_configuration.mdx
index 6b90c8cf0..1bc9f8242 100644
--- a/docs/code-search/code-navigation/inference_configuration.mdx
+++ b/docs/code-search/code-navigation/inference_configuration.mdx
@@ -40,6 +40,8 @@ To **add** additional behaviors, you can create and register a new **recognizer*
 
 A _path recognizer_ is a concrete recognizer that advertises a set of path _globs_ it is interested in, then invokes its `generate` function with matching paths from a repository. In the following, all files matching `Snek.module` (`Snek.module`, `proj/Snek.module`, `proj/sub/Snek.module`, etc) are passed to a call to `generate` (if non-empty). The generate function will then return a list of indexing job descriptions. The [guide for auto-indexing jobs configuration](/code-search/code-navigation/auto_indexing_configuration#keys-1) gives detailed descriptions on the fields of this object.
 
+The ordering of paths and limits are defined in the [Ordering guarantees and limits](#ordering-guarantees-and-limits) section.
+
 ```lua
 local path = require("path")
 local pattern = require("sg.autoindex.patterns")
@@ -213,3 +215,37 @@ This library defines the following two JSON utility functions:
 ### `fun`
 
 [Lua Functional](https://github.com/luafun/luafun/tree/cb6a7e25d4b55d9578fd371d1474b00e47bd29f3#lua-functional) is a high-performance functional programming library accessible via `local fun = require("fun")`. This library has a number of functional utilities to help make recognizer code a bit more expressive.
+
+## Ordering guarantees and limits
+
+Sourcegraph enforces several limits to avoid inference timeouts and ever-growing auto-indexing queues. These limits apply for a single round of inference for a single repository, combined across all recognizers, including any implicitly included Sourcegraph recognizers.
+
+Limit | Default value
+:-----|-------------:
+The number of auto-indexing jobs inferred | 100
+The number of total paths passed to the inference script's `generate` functions as the second argument `paths` | 500
+The number of total paths with contents passed to the inference script's `generate` functions as the third argument `contents_by_paths` | 100
+Maximum size limit for file contents, in bytes | 1 MiB
+
+<Callout type="note">Please reach out to Sourcegraph support if you'd like to change these limits.</Callout>
+<!--
+We deliberately don't document the environment variables for
+changing these limits as customers should generally not be changing
+them without good reason. So it's better to have them at least
+reach out to us first, and we can advise them on a case-by-case basis.
+-->
+
+If the number of auto-indexing jobs and/or paths exceeds these limits, excess jobs and/or paths will be sorted based on the following ordering and trailing jobs will be discarded:
+
+- For auto-indexing jobs, sorting happens in the following order:
+
+  - Descending order of indexer frequency (total number of inferred jobs with the same `indexer` field).
+  - Lexicographic ordering of `indexer`.
+  - Descending order of number of path components for `root`. Shallower roots are preferrred over deeper ones as they are more likely to cover more code.
+  - Lexicographic ordering of `root` paths.
+
+- For paths, sorting happens in the following order:
+
+  - Paths for which the contents are requested come before other paths.
+  - Paths with fewer components come before paths with more components.
+  - Otherwise, paths are ordered lexicographically.

From fe91f824eb956e8db9a68dbee94ca00a54c3ad65 Mon Sep 17 00:00:00 2001
From: Varun Gandhi <varun.gandhi@sourcegraph.com>
Date: Tue, 22 Oct 2024 12:53:01 +0800
Subject: [PATCH 2/2] Address review comments

---
 .../code-navigation/inference_configuration.mdx  | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/docs/code-search/code-navigation/inference_configuration.mdx b/docs/code-search/code-navigation/inference_configuration.mdx
index 1bc9f8242..3708775a3 100644
--- a/docs/code-search/code-navigation/inference_configuration.mdx
+++ b/docs/code-search/code-navigation/inference_configuration.mdx
@@ -235,17 +235,17 @@ them without good reason. So it's better to have them at least
 reach out to us first, and we can advise them on a case-by-case basis.
 -->
 
-If the number of auto-indexing jobs and/or paths exceeds these limits, excess jobs and/or paths will be sorted based on the following ordering and trailing jobs will be discarded:
+Auto-indexing jobs and paths are first ranked based on the criteria described below. If the number of jobs and/or paths exceeds the limits above, lower ranked items are discarded.
 
-- For auto-indexing jobs, sorting happens in the following order:
+- For auto-indexing jobs, ranking is done based on the following:
 
   - Descending order of indexer frequency (total number of inferred jobs with the same `indexer` field).
-  - Lexicographic ordering of `indexer`.
+  - Ascending lexicographic ordering of `indexer`.
   - Descending order of number of path components for `root`. Shallower roots are preferrred over deeper ones as they are more likely to cover more code.
-  - Lexicographic ordering of `root` paths.
+  - Ascending lexicographic ordering of `root` paths.
 
-- For paths, sorting happens in the following order:
+- For paths, ranking happens in the following order:
 
-  - Paths for which the contents are requested come before other paths.
-  - Paths with fewer components come before paths with more components.
-  - Otherwise, paths are ordered lexicographically.
+  - Paths for which the contents are requested are ranked higher.
+  - Paths with fewer components are ranked higher.
+  - Otherwise, lexicographic ordering of paths is used.