sourcegraph · varungandhi-src · Oct 22, 2024 · Oct 21, 2024 · Oct 22, 2024 · MaedahBatool
diff --git a/docs/code-search/code-navigation/inference_configuration.mdx b/docs/code-search/code-navigation/inference_configuration.mdx
@@ -40,6 +40,8 @@ To **add** additional behaviors, you can create and register a new **recognizer*
 
 A _path recognizer_ is a concrete recognizer that advertises a set of path _globs_ it is interested in, then invokes its `generate` function with matching paths from a repository. In the following, all files matching `Snek.module` (`Snek.module`, `proj/Snek.module`, `proj/sub/Snek.module`, etc) are passed to a call to `generate` (if non-empty). The generate function will then return a list of indexing job descriptions. The [guide for auto-indexing jobs configuration](/code-search/code-navigation/auto_indexing_configuration#keys-1) gives detailed descriptions on the fields of this object.
 
+The ordering of paths and limits are defined in the [Ordering guarantees and limits](#ordering-guarantees-and-limits) section.
+
 ```lua
 local path = require("path")
 local pattern = require("sg.autoindex.patterns")
@@ -213,3 +215,37 @@ This library defines the following two JSON utility functions:
 ### `fun`
 
 [Lua Functional](https://github.com/luafun/luafun/tree/cb6a7e25d4b55d9578fd371d1474b00e47bd29f3#lua-functional) is a high-performance functional programming library accessible via `local fun = require("fun")`. This library has a number of functional utilities to help make recognizer code a bit more expressive.
+
+## Ordering guarantees and limits
+
+Sourcegraph enforces several limits to avoid inference timeouts and ever-growing auto-indexing queues. These limits apply for a single round of inference for a single repository, combined across all recognizers, including any implicitly included Sourcegraph recognizers.
+
+Limit | Default value
+:-----|-------------:
+The number of auto-indexing jobs inferred | 100
+The number of total paths passed to the inference script's `generate` functions as the second argument `paths` | 500
+The number of total paths with contents passed to the inference script's `generate` functions as the third argument `contents_by_paths` | 100
+Maximum size limit for file contents, in bytes | 1 MiB
+
+<Callout type="note">Please reach out to Sourcegraph support if you'd like to change these limits.</Callout>
+<!--
+We deliberately don't document the environment variables for
+changing these limits as customers should generally not be changing
+them without good reason. So it's better to have them at least
+reach out to us first, and we can advise them on a case-by-case basis.
+-->
+
+Auto-indexing jobs and paths are first ranked based on the criteria described below. If the number of jobs and/or paths exceeds the limits above, lower ranked items are discarded.
+
+- For auto-indexing jobs, ranking is done based on the following:
+
+  - Descending order of indexer frequency (total number of inferred jobs with the same `indexer` field).
+  - Ascending lexicographic ordering of `indexer`.
+  - Descending order of number of path components for `root`. Shallower roots are preferrred over deeper ones as they are more likely to cover more code.
+  - Ascending lexicographic ordering of `root` paths.
+
+- For paths, ranking happens in the following order:
+
+  - Paths for which the contents are requested are ranked higher.
+  - Paths with fewer components are ranked higher.
+  - Otherwise, lexicographic ordering of paths is used.