diff --git a/docs/code-search/code-navigation/inference_configuration.mdx b/docs/code-search/code-navigation/inference_configuration.mdx index 6b90c8cf0..3708775a3 100644 --- a/docs/code-search/code-navigation/inference_configuration.mdx +++ b/docs/code-search/code-navigation/inference_configuration.mdx @@ -40,6 +40,8 @@ To **add** additional behaviors, you can create and register a new **recognizer* A _path recognizer_ is a concrete recognizer that advertises a set of path _globs_ it is interested in, then invokes its `generate` function with matching paths from a repository. In the following, all files matching `Snek.module` (`Snek.module`, `proj/Snek.module`, `proj/sub/Snek.module`, etc) are passed to a call to `generate` (if non-empty). The generate function will then return a list of indexing job descriptions. The [guide for auto-indexing jobs configuration](/code-search/code-navigation/auto_indexing_configuration#keys-1) gives detailed descriptions on the fields of this object. +The ordering of paths and limits are defined in the [Ordering guarantees and limits](#ordering-guarantees-and-limits) section. + ```lua local path = require("path") local pattern = require("sg.autoindex.patterns") @@ -213,3 +215,37 @@ This library defines the following two JSON utility functions: ### `fun` [Lua Functional](https://github.com/luafun/luafun/tree/cb6a7e25d4b55d9578fd371d1474b00e47bd29f3#lua-functional) is a high-performance functional programming library accessible via `local fun = require("fun")`. This library has a number of functional utilities to help make recognizer code a bit more expressive. + +## Ordering guarantees and limits + +Sourcegraph enforces several limits to avoid inference timeouts and ever-growing auto-indexing queues. These limits apply for a single round of inference for a single repository, combined across all recognizers, including any implicitly included Sourcegraph recognizers. + +Limit | Default value +:-----|-------------: +The number of auto-indexing jobs inferred | 100 +The number of total paths passed to the inference script's `generate` functions as the second argument `paths` | 500 +The number of total paths with contents passed to the inference script's `generate` functions as the third argument `contents_by_paths` | 100 +Maximum size limit for file contents, in bytes | 1 MiB + +Please reach out to Sourcegraph support if you'd like to change these limits. + + +Auto-indexing jobs and paths are first ranked based on the criteria described below. If the number of jobs and/or paths exceeds the limits above, lower ranked items are discarded. + +- For auto-indexing jobs, ranking is done based on the following: + + - Descending order of indexer frequency (total number of inferred jobs with the same `indexer` field). + - Ascending lexicographic ordering of `indexer`. + - Descending order of number of path components for `root`. Shallower roots are preferrred over deeper ones as they are more likely to cover more code. + - Ascending lexicographic ordering of `root` paths. + +- For paths, ranking happens in the following order: + + - Paths for which the contents are requested are ranked higher. + - Paths with fewer components are ranked higher. + - Otherwise, lexicographic ordering of paths is used.