From 2259a74c14c1a843dd45da585b50b43c4b9f9e29 Mon Sep 17 00:00:00 2001 From: Varun Gandhi Date: Mon, 21 Oct 2024 18:15:52 +0800 Subject: [PATCH 1/2] docs: Update for new auto-indexing limits --- .../inference_configuration.mdx | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/docs/code-search/code-navigation/inference_configuration.mdx b/docs/code-search/code-navigation/inference_configuration.mdx index 6b90c8cf0..1bc9f8242 100644 --- a/docs/code-search/code-navigation/inference_configuration.mdx +++ b/docs/code-search/code-navigation/inference_configuration.mdx @@ -40,6 +40,8 @@ To **add** additional behaviors, you can create and register a new **recognizer* A _path recognizer_ is a concrete recognizer that advertises a set of path _globs_ it is interested in, then invokes its `generate` function with matching paths from a repository. In the following, all files matching `Snek.module` (`Snek.module`, `proj/Snek.module`, `proj/sub/Snek.module`, etc) are passed to a call to `generate` (if non-empty). The generate function will then return a list of indexing job descriptions. The [guide for auto-indexing jobs configuration](/code-search/code-navigation/auto_indexing_configuration#keys-1) gives detailed descriptions on the fields of this object. +The ordering of paths and limits are defined in the [Ordering guarantees and limits](#ordering-guarantees-and-limits) section. + ```lua local path = require("path") local pattern = require("sg.autoindex.patterns") @@ -213,3 +215,37 @@ This library defines the following two JSON utility functions: ### `fun` [Lua Functional](https://github.com/luafun/luafun/tree/cb6a7e25d4b55d9578fd371d1474b00e47bd29f3#lua-functional) is a high-performance functional programming library accessible via `local fun = require("fun")`. This library has a number of functional utilities to help make recognizer code a bit more expressive. + +## Ordering guarantees and limits + +Sourcegraph enforces several limits to avoid inference timeouts and ever-growing auto-indexing queues. These limits apply for a single round of inference for a single repository, combined across all recognizers, including any implicitly included Sourcegraph recognizers. + +Limit | Default value +:-----|-------------: +The number of auto-indexing jobs inferred | 100 +The number of total paths passed to the inference script's `generate` functions as the second argument `paths` | 500 +The number of total paths with contents passed to the inference script's `generate` functions as the third argument `contents_by_paths` | 100 +Maximum size limit for file contents, in bytes | 1 MiB + +Please reach out to Sourcegraph support if you'd like to change these limits. + + +If the number of auto-indexing jobs and/or paths exceeds these limits, excess jobs and/or paths will be sorted based on the following ordering and trailing jobs will be discarded: + +- For auto-indexing jobs, sorting happens in the following order: + + - Descending order of indexer frequency (total number of inferred jobs with the same `indexer` field). + - Lexicographic ordering of `indexer`. + - Descending order of number of path components for `root`. Shallower roots are preferrred over deeper ones as they are more likely to cover more code. + - Lexicographic ordering of `root` paths. + +- For paths, sorting happens in the following order: + + - Paths for which the contents are requested come before other paths. + - Paths with fewer components come before paths with more components. + - Otherwise, paths are ordered lexicographically. From fe91f824eb956e8db9a68dbee94ca00a54c3ad65 Mon Sep 17 00:00:00 2001 From: Varun Gandhi Date: Tue, 22 Oct 2024 12:53:01 +0800 Subject: [PATCH 2/2] Address review comments --- .../code-navigation/inference_configuration.mdx | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/code-search/code-navigation/inference_configuration.mdx b/docs/code-search/code-navigation/inference_configuration.mdx index 1bc9f8242..3708775a3 100644 --- a/docs/code-search/code-navigation/inference_configuration.mdx +++ b/docs/code-search/code-navigation/inference_configuration.mdx @@ -235,17 +235,17 @@ them without good reason. So it's better to have them at least reach out to us first, and we can advise them on a case-by-case basis. --> -If the number of auto-indexing jobs and/or paths exceeds these limits, excess jobs and/or paths will be sorted based on the following ordering and trailing jobs will be discarded: +Auto-indexing jobs and paths are first ranked based on the criteria described below. If the number of jobs and/or paths exceeds the limits above, lower ranked items are discarded. -- For auto-indexing jobs, sorting happens in the following order: +- For auto-indexing jobs, ranking is done based on the following: - Descending order of indexer frequency (total number of inferred jobs with the same `indexer` field). - - Lexicographic ordering of `indexer`. + - Ascending lexicographic ordering of `indexer`. - Descending order of number of path components for `root`. Shallower roots are preferrred over deeper ones as they are more likely to cover more code. - - Lexicographic ordering of `root` paths. + - Ascending lexicographic ordering of `root` paths. -- For paths, sorting happens in the following order: +- For paths, ranking happens in the following order: - - Paths for which the contents are requested come before other paths. - - Paths with fewer components come before paths with more components. - - Otherwise, paths are ordered lexicographically. + - Paths for which the contents are requested are ranked higher. + - Paths with fewer components are ranked higher. + - Otherwise, lexicographic ordering of paths is used.