Add configurable index.max_doc_id_length setting (#19075)#20919
Add configurable index.max_doc_id_length setting (#19075)#20919sakrah wants to merge 3 commits intoopensearch-project:mainfrom
Conversation
…19075) Introduce a per-index setting index.max_doc_id_length that allows operators to raise the maximum allowed _id length beyond the current hard-coded 512-byte default, up to Lucene MAX_TERM_LENGTH (32766 bytes). This enables workloads with naturally long identifiers—metric paths, URLs, composite keys—to use them as _id directly, avoiding the need to hash the identifier and store the original in a separate field. The implementation uses two-tier validation: - Request level: rejects IDs exceeding Lucene hard limit (32766 bytes) for early failure before routing. - Shard level: enforces the per-index index.max_doc_id_length setting (default 512) from TransportShardBulkAction where index settings are available. The setting is dynamic and index-scoped, so it can be changed on existing indices without a restart. Signed-off-by: Sam Akrah <sakrah@uber.com> Made-with: Cursor
PR Reviewer Guide 🔍(Review updated until commit 0b69a6c)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 0b69a6c Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 84f7372
Suggestions up to commit 20e35a3
|
|
❌ Gradle check result for 20e35a3: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…rch-project#20919) Signed-off-by: Sam Akrah <sakrah@uber.com>
|
Persistent review updated to latest commit 84f7372 |
…19075) Introduce a per-index setting index.max_doc_id_length that allows operators to raise the maximum allowed _id length beyond the current hard-coded 512-byte default, up to Lucene MAX_TERM_LENGTH (32766 bytes). This enables workloads with naturally long identifiers—metric paths, URLs, composite keys—to use them as _id directly, avoiding the need to hash the identifier and store the original in a separate field. The implementation uses two-tier validation: - Request level: rejects IDs exceeding Lucene hard limit (32766 bytes) for early failure before routing. - Shard level: enforces the per-index index.max_doc_id_length setting (default 512) from TransportShardBulkAction where index settings are available. The setting is dynamic and index-scoped, so it can be changed on existing indices without a restart. Signed-off-by: Sam Akrah <sakrah@uber.com>
|
Persistent review updated to latest commit 0b69a6c |
|
❌ Gradle check result for 0b69a6c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Introduces a new per-index setting
index.max_doc_id_lengththat allows operators to raise the maximum allowed_idfield length beyond the current hard-coded 512-byte default, up to Lucene'sMAX_TERM_LENGTH(32766 bytes).Motivation: The existing 512-byte
_idlimit was inherited from Elasticsearch for HTTP GET URL ergonomics — not for any technical or performance reason. Workloads with naturally long identifiers (metric paths, URLs, composite keys) are forced to hash these identifiers and store the originals in a separate field, adding storage overhead, query complexity, and write-path logic. Making the limit configurable removes this workaround entirely.Implementation — two-tier validation:
IndexRequest/UpdateRequestTransportShardBulkActionindex.max_doc_id_length(default 512)Key design decisions:
DynamicandIndexScope— can be changed on existing indices without a restartMAX_TERM_LENGTH) — the absolute physical limitFiles changed:
IndexSettings.javaMAX_DOC_ID_LENGTH_SETTINGwith default 512, min 512, max 32766IndexScopedSettings.javaDocWriteRequest.javavalidateDocIdLength(id, maxLength, ...)+ add constantsIndexRequest.java/UpdateRequest.javaTransportShardBulkAction.javaBulkIntegrationIT.javaIndexRequestTests.java/BulkRequestTests.javaCHANGELOG.md[Unreleased 3.x] > AddedRelated Issues
Resolves #19075
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.