Skip to content

Commit 61d1dfc

Browse files
docs: Update for new auto-indexing limits
1 parent ba9e9bc commit 61d1dfc

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

docs/code-search/code-navigation/inference_configuration.mdx

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ To **add** additional behaviors, you can create and register a new **recognizer*
4040

4141
A _path recognizer_ is a concrete recognizer that advertises a set of path _globs_ it is interested in, then invokes its `generate` function with matching paths from a repository. In the following, all files matching `Snek.module` (`Snek.module`, `proj/Snek.module`, `proj/sub/Snek.module`, etc) are passed to a call to `generate` (if non-empty). The generate function will then return a list of indexing job descriptions. The [guide for auto-indexing jobs configuration](/code-search/code-navigation/auto_indexing_configuration#keys-1) gives detailed descriptions on the fields of this object.
4242

43+
The ordering of paths and limits are defined in the [Ordering guarantees and limits](#ordering-guarantees-and-limits) section.
44+
4345
```lua
4446
local path = require("path")
4547
local pattern = require("sg.autoindex.patterns")
@@ -213,3 +215,37 @@ This library defines the following two JSON utility functions:
213215
### `fun`
214216

215217
[Lua Functional](https://github.com/luafun/luafun/tree/cb6a7e25d4b55d9578fd371d1474b00e47bd29f3#lua-functional) is a high-performance functional programming library accessible via `local fun = require("fun")`. This library has a number of functional utilities to help make recognizer code a bit more expressive.
218+
219+
## Ordering guarantees and limits
220+
221+
Sourcegraph enforces several limits to avoid inference timeouts and ever-growing auto-indexing queues. These limits apply for a single round of inference for a single repository, combined across all recognizers, including any implicitly included Sourcegraph recognizers.
222+
223+
Limit | Default value
224+
:-----|-------------:
225+
The number of auto-indexing jobs inferred | 100
226+
The number of total paths passed to the inference script's `generate` functions as the second argument `paths` | 500
227+
The number of total paths with contents passed to the inference script's `generate` functions as the third argument `contents_by_paths` | 100
228+
Maximum size limit for file contents, in bytes | 1 MiB
229+
230+
<Callout type="note">Please reach out to Sourcegraph support if you'd like to change these limits.</Callout>
231+
<!--
232+
We deliberately don't document the environment variables for
233+
changing these limits as customers should generally not be changing
234+
them without good reason. So it's better to have them at least
235+
reach out to us first, and we can advise them on a case-by-case basis.
236+
-->
237+
238+
If the number of auto-indexing jobs and/or paths exceeds these limits, excess jobs and/or paths will be sorted based on the following ordering and trailing jobs will be discarded:
239+
240+
- For auto-indexing jobs, sorting happens in the following order:
241+
242+
- Descending order of indexer frequency (total number of inferred jobs with the same `indexer` field).
243+
- Lexicographic ordering of `indexer`.
244+
- Descending order of number of path components for `root`. Shallower roots are preferrred over deeper ones as they are more likely to cover more code.
245+
- Lexicographic ordering of `root` paths.
246+
247+
- For paths, sorting happens in the following order:
248+
249+
- Paths for which the contents are requested come before other paths.
250+
- Paths with fewer components come before paths with more components.
251+
- Otherwise, paths are ordered lexicographically.

0 commit comments

Comments
 (0)