Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions docs/code-search/code-navigation/inference_configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ To **add** additional behaviors, you can create and register a new **recognizer*

A _path recognizer_ is a concrete recognizer that advertises a set of path _globs_ it is interested in, then invokes its `generate` function with matching paths from a repository. In the following, all files matching `Snek.module` (`Snek.module`, `proj/Snek.module`, `proj/sub/Snek.module`, etc) are passed to a call to `generate` (if non-empty). The generate function will then return a list of indexing job descriptions. The [guide for auto-indexing jobs configuration](/code-search/code-navigation/auto_indexing_configuration#keys-1) gives detailed descriptions on the fields of this object.

The ordering of paths and limits are defined in the [Ordering guarantees and limits](#ordering-guarantees-and-limits) section.

```lua
local path = require("path")
local pattern = require("sg.autoindex.patterns")
Expand Down Expand Up @@ -213,3 +215,37 @@ This library defines the following two JSON utility functions:
### `fun`

[Lua Functional](https://github.com/luafun/luafun/tree/cb6a7e25d4b55d9578fd371d1474b00e47bd29f3#lua-functional) is a high-performance functional programming library accessible via `local fun = require("fun")`. This library has a number of functional utilities to help make recognizer code a bit more expressive.

## Ordering guarantees and limits

Sourcegraph enforces several limits to avoid inference timeouts and ever-growing auto-indexing queues. These limits apply for a single round of inference for a single repository, combined across all recognizers, including any implicitly included Sourcegraph recognizers.

Limit | Default value
:-----|-------------:
The number of auto-indexing jobs inferred | 100
The number of total paths passed to the inference script's `generate` functions as the second argument `paths` | 500
The number of total paths with contents passed to the inference script's `generate` functions as the third argument `contents_by_paths` | 100
Maximum size limit for file contents, in bytes | 1 MiB

<Callout type="note">Please reach out to Sourcegraph support if you'd like to change these limits.</Callout>
<!--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @varungandhi-src this comment format is not supported in MDX which was failing all production deployment in Vercel. I have removed this part as it was blocking all the upcoming commits to our main.

We deliberately don't document the environment variables for
changing these limits as customers should generally not be changing
them without good reason. So it's better to have them at least
reach out to us first, and we can advise them on a case-by-case basis.
-->

Auto-indexing jobs and paths are first ranked based on the criteria described below. If the number of jobs and/or paths exceeds the limits above, lower ranked items are discarded.

- For auto-indexing jobs, ranking is done based on the following:

- Descending order of indexer frequency (total number of inferred jobs with the same `indexer` field).
- Ascending lexicographic ordering of `indexer`.
- Descending order of number of path components for `root`. Shallower roots are preferrred over deeper ones as they are more likely to cover more code.
- Ascending lexicographic ordering of `root` paths.

- For paths, ranking happens in the following order:

- Paths for which the contents are requested are ranked higher.
- Paths with fewer components are ranked higher.
- Otherwise, lexicographic ordering of paths is used.
Loading