Skip to content

Commit c7efb73

Browse files
Zoekt indexer trigram and file size limits (#1436)
The documentation was updated in `search.mdx` to specify that the Zoekt indexer skips files exceeding 20,000 unique trigrams or those that are not valid UTF-8. Instructions were added detailing how to override these limits by configuring the `search.largeFiles` setting and reindexing the repository. --- Thread: https://ampcode.com/threads/T-0390a39a-9c04-441e-8982-7e2ef7b9bf76 --------- Co-authored-by: Amp <[email protected]>
1 parent c6fbaa2 commit c7efb73

File tree

1 file changed

+11
-2
lines changed

1 file changed

+11
-2
lines changed

docs/admin/search.mdx

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,9 +67,18 @@ will not return any result.
6767
6868
## Indexed search
6969
70-
Sourcegraph indexes the code on the default branch of each repository. This speeds up searches that hit many repositories at once. Not all files in a repository branch are indexed, we skip files that are [larger than 1 MB](#maximum-file-size) and binary files. To view which files are skipped during indexing, visit the repository settings page and click on indexing.
70+
Sourcegraph indexes the code on the default branch of each repository. This speeds up searches that hit many repositories at once. Not all files in a repository branch are indexed. We skip:
7171
72-
For large deployments we recommend horizontally scaling indexed search. You can do this by [adjusting the number of replicas](https://github.com/sourcegraph/deploy-sourcegraph/blob/master/docs/configure#configure-indexed-search-replica-count). Sourcegraph shards repository indexes across replicas. When the replica count changes Sourcegraph will slowly rebalance indexes to ensure availability of existing indexes.
72+
- Files that are [larger than 1 MB](#maximum-file-size).
73+
- Binary files.
74+
- Files exceeding 20,000 unique trigrams (sequences of three characters).
75+
- Files that are not valid UTF-8.
76+
77+
To view which files are skipped during indexing, visit the repository settings page and click on **Indexing**.
78+
79+
To force the indexer to include specific files (like `yarn.lock` or other large text files) that are otherwise skipped, add their file path or a glob pattern to the [search.largeFiles](https://sourcegraph.com/docs/admin/search#maximum-file-size) setting in your site configuration and reindex the repository. Note that files must still be valid UTF-8 to be indexed, even if added to `search.largeFiles`.
80+
81+
For large deployments we recommend horizontally scaling indexed search. You can do this by adjusting the [number of replicas](https://sourcegraph.com/docs/admin/deploy/kubernetes/configure). Sourcegraph shards repository indexes across replicas. When the replica count changes Sourcegraph will slowly rebalance indexes to ensure availability of existing indexes.
7382
7483
The resource requirements for indexed search vary considerably based on the text contents of your repositories, but a good estimate is that the node should have enough memory to hold the entire text contents of the default branch of each repository.
7584

0 commit comments

Comments
 (0)