Feat/hunspell ref path by shayush622 · Pull Request #20792 · opensearch-project/OpenSearch

shayush622 · 2026-03-06T18:24:38Z

Add INDEX_REF_PATH_SETTING for package-based hunspell dictionaries
Add RestHunspellCacheInvalidateAction for cache invalidation endpoint
Update HunspellService with cache management methods
Add ref_path validation in MetadataCreateIndexService

Description

This PR adds support for loading Hunspell dictionaries from package-based directories using a new ref_path parameter, enabling multi-tenant dictionary isolation and hot-reload capabilities.

Related Issues

Resolves #[20712]
Link to #20712 (comment).

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

- Add INDEX_REF_PATH_SETTING for package-based hunspell dictionaries - Add RestHunspellCacheInvalidateAction for cache invalidation endpoint - Update HunspellService with cache management methods - Add ref_path validation in MetadataCreateIndexService Signed-off-by: shayush622 <ayush5267@gmail.com>

- Add ref_path parameter to hunspell token filter for package-based dictionaries - Load dictionaries from config/packages/{packageId}/hunspell/{locale}/ - Add cache invalidation REST API endpoints: - GET /_hunspell/cache (view cached dictionaries) - POST /_hunspell/cache/_invalidate (invalidate by package_id, locale, or cache_key) - POST /_hunspell/cache/_invalidate_all (clear all cached dictionaries) - Add TransportAction with cluster:admin/hunspell/cache/clear permission - Add path traversal security validation for packageId and locale - Add updateable flag support for hot-reload via _reload_search_analyzers - Add comprehensive test coverage for HunspellTokenFilterFactory and REST endpoint - Remove unused index-level ref_path setting Addresses PR feedback: - Removed mise.toml from commit (local dev config) - Added TransportAction authorization pattern - Added 21 unit tests for REST endpoint - Fixed security validation for path traversal attacks - Documented race condition in invalidateAllDictionaries Signed-off-by: shayush622 <ayush5267@gmail.com>

- Add ref_path parameter for package-based dictionary loading - Load from config/packages/{packageId}/hunspell/{locale}/ - Add cache invalidation REST API (GET/POST /_hunspell/cache/_invalidate) - Add TransportAction with cluster:admin permission - Add comprehensive security validation (path traversal, separators, cache-key injection) - Add updateable flag for hot-reload via _reload_search_analyzers - Add comprehensive test coverage PR feedback addressed: - Stricter validate() to reject conflicting params - Path traversal checks now use config/packages/ as base - ref_path/locale validation rejects ., .., /, \, : characters Signed-off-by: shayush622 <ayush5267@gmail.com>

Signed-off-by: shayush622 <ayush5267@gmail.com>

- Add ref_path parameter for package-based dictionary loading - Load from config/packages/{packageId}/hunspell/{locale}/ - Add cache info API: GET /_hunspell/cache (cluster:monitor/hunspell/cache) - Add cache invalidation API: POST /_hunspell/cache/_invalidate (cluster:admin/hunspell/cache/invalidate) - Support invalidation by package_id, locale, cache_key, or invalidate_all - Add security validation (path traversal, separator injection, null bytes) - Add updateable flag for hot-reload via _reload_search_analyzers - Use Strings.hasText() and Strings.isNullOrEmpty() for validation consistency - Consistent response schema with all fields always present - Add unit tests, REST handler tests, and integration tests Signed-off-by: shayush622 <ayush5267@gmail.com>

github-actions · 2026-03-06T18:24:50Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 3917262.

'Diff too large, requires skip by maintainers after manual review'

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-06T18:37:17Z

PR Reviewer Guide 🔍

(Review updated until commit `385793b`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 Security concerns Path traversal mitigation: The loadDictionaryFromPackage() method includes path traversal checks, but the locale parameter is also resolved as a path (packageHunspellDir.resolve(locale)) and only checked against the hunspell directory after resolution. While there is a startsWith check, the validatePackageIdentifier() in HunspellTokenFilterFactory (which rejects '/', '\', '..') is not called from within HunspellService itself. If getDictionaryFromPackage() is called directly (not through HunspellTokenFilterFactory), the locale parameter could potentially contain path traversal sequences like "../../../etc" that might pass the startsWith check depending on normalization behavior. It is recommended to add input validation directly in getDictionaryFromPackage() and loadDictionaryFromPackage() rather than relying solely on callers to validate.
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Add ref_path support for package-based hunspell dictionary loading Relevant files: server/src/main/java/org/opensearch/indices/analysis/HunspellService.java server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java server/src/test/java/org/opensearch/indices/analyze/HunspellServiceTests.java server/src/test/java/org/opensearch/index/analysis/HunspellTokenFilterFactoryTests.java server/src/test/resources/indices/analyze/conf_dir/packages/test-pkg/hunspell/en_US/en_US.aff Sub-PR theme: Add Hunspell cache invalidation REST API and transport actions Relevant files: server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateAction.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateRequest.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateResponse.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoAction.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoRequest.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoResponse.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/TransportHunspellCacheInvalidateAction.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/TransportHunspellCacheInfoAction.java server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java server/src/main/java/org/opensearch/action/ActionModule.java server/src/main/java/org/opensearch/node/Node.java server/src/test/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateActionTests.java server/src/internalClusterTest/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheIT.java
⚡ Recommended focus areas for review Null HunspellService In ActionModule.configure(), HunspellService is only bound if non-null. However, TransportHunspellCacheInvalidateAction and TransportHunspellCacheInfoAction both require HunspellService via @Inject. If hunspellService is null, these transport actions will fail at runtime with an injection error rather than a clear error message. Race Condition In invalidateDictionariesByPackage(), the size is captured before removeIf, but ConcurrentHashMap.removeIf is not atomic with respect to size(). The count calculation (sizeBefore - dictionaries.size()) may be inaccurate in concurrent scenarios. While the comment acknowledges approximate counts for logging, this could mislead operators about actual invalidation counts returned via the API. String prefix = packageId + CACHE_KEY_SEPARATOR; // Match keys like "pkg-1234:en_US" int sizeBefore = dictionaries.size(); dictionaries.keySet().removeIf(key -> key.startsWith(prefix)); int count = sizeBefore - dictionaries.size(); Resource Leak In loadDictionaryFromPackage(), if an exception occurs after opening dicStreams but before or during the try-with-resources block for the NIOFSDirectory, the affixStream may not be closed if it was opened before the exception. The finally block closes both, but if affixStream assignment itself throws, dicStreams may not be closed. InputStream affixStream = null; List<InputStream> dicStreams = new ArrayList<>(dicFiles.length); try { for (Path dicFile : dicFiles) { dicStreams.add(Files.newInputStream(dicFile)); } affixStream = Files.newInputStream(affixFiles[0]); try (Directory tmp = new NIOFSDirectory(env.tmpDir())) { return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase); } } catch (Exception e) { logger.error( () -> new ParameterizedMessage("Could not load hunspell dictionary from package [{}] locale [{}]", packageId, locale), e ); throw e; } finally { IOUtils.close(affixStream); IOUtils.close(dicStreams); } Path-based Routing The invalidate_all detection relies on request.path().endsWith("/_invalidate_all"), which is fragile. If the path has trailing slashes or query parameters affect path resolution, this check could fail silently and result in no invalidation action being taken (empty request with no parameters set, which would fail validation). if (request.path().endsWith("/_invalidate_all")) { invalidateRequest.setInvalidateAll(true); } else { String packageId = request.param("package_id"); String locale = request.param("locale"); String cacheKey = request.param("cache_key"); invalidateRequest.setPackageId(packageId); invalidateRequest.setLocale(locale); invalidateRequest.setCacheKey(cacheKey); } Cache Key Collision The isPackageCacheKey() method uses colon (':') as the separator to distinguish package-based keys from traditional locale keys. However, if a traditional locale name somehow contains a colon (unlikely but possible in edge cases), it would be misidentified as a package-based key. The validatePackageIdentifier() method in HunspellTokenFilterFactory rejects colons, but getDictionary() (traditional path) does not call this validation. public static boolean isPackageCacheKey(String cacheKey) { return cacheKey != null && cacheKey.contains(CACHE_KEY_SEPARATOR); }

github-actions · 2026-03-06T18:38:02Z

PR Code Suggestions ✨

Latest suggestions up to 385793b

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix non-atomic removal count in concurrent map Using `sizeBefore - dictionaries.size()` to count removed entries is not atomic on a `ConcurrentHashMap` and can yield incorrect counts if concurrent modifications occur between the two `size()` calls. Instead, count the removals directly inside the `removeIf` predicate using an `AtomicInteger`. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [509-511] -int sizeBefore = dictionaries.size(); -dictionaries.keySet().removeIf(key -> key.startsWith(prefix)); -int count = sizeBefore - dictionaries.size(); +java.util.concurrent.atomic.AtomicInteger count = new java.util.concurrent.atomic.AtomicInteger(0); +dictionaries.keySet().removeIf(key -> { + if (key.startsWith(prefix)) { + count.incrementAndGet(); + return true; + } + return false; +}); +int removedCount = count.get(); Suggestion importance[1-10]: 6 __ Why: The non-atomic size difference calculation between two `ConcurrentHashMap.size()` calls can yield incorrect counts under concurrent access. Using an `AtomicInteger` inside `removeIf` is a valid and more accurate approach for counting actual removals.	Low
	Prevent silent injection failure for null service If `hunspellService` is null, `TransportHunspellCacheInvalidateAction` and `TransportHunspellCacheInfoAction` will fail at injection time with a cryptic error. The binding should always be present, or the transport actions should handle a null/optional service gracefully. Consider throwing an explicit error or always providing a non-null `HunspellService`. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [1129-1131] -if (hunspellService != null) { - bind(HunspellService.class).toInstance(hunspellService); +if (hunspellService == null) { + throw new IllegalStateException("HunspellService must not be null when registering hunspell cache actions"); } +bind(HunspellService.class).toInstance(hunspellService); Suggestion importance[1-10]: 5 __ Why: The suggestion points to code in `ActionModule.java` but the `existing_code` references `HunspellService.java`. The actual code is in `ActionModule.java` at the `configure()` method. The concern is valid — a null `hunspellService` would cause injection failures — but the suggestion's file reference is wrong. Additionally, in tests `null` is explicitly passed, so always throwing would break tests.	Low
	Avoid double-closing streams after successful dictionary load If `Files.newInputStream(dicFile)` succeeds for some files but then throws for a later file, the already-opened streams are added to `dicStreams` but `affixStream` is still null. The `finally` block will close the partial `dicStreams` correctly, but if an exception occurs inside the `Dictionary` constructor after streams are opened, the streams are closed twice (once by `Dictionary` and once by `finally`). More critically, if opening any `dicFile` stream fails mid-loop, the partially-filled `dicStreams` list is still closed in `finally`, which is correct—but `affixStream` is opened after all dic streams, so a failure there leaves all dic streams open until `finally`. This ordering is fine, but the `Dictionary` constructor may close the streams itself on success, leading to double-close in `finally`. Consider using a flag or nulling streams after handing them to `Dictionary`. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [315-336] InputStream affixStream = null; List<InputStream> dicStreams = new ArrayList<>(dicFiles.length); +boolean success = false; try { for (Path dicFile : dicFiles) { dicStreams.add(Files.newInputStream(dicFile)); } affixStream = Files.newInputStream(affixFiles[0]); try (Directory tmp = new NIOFSDirectory(env.tmpDir())) { - return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase); + Dictionary dict = new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase); + success = true; + return dict; } } catch (Exception e) { - ... + logger.error( + () -> new ParameterizedMessage("Could not load hunspell dictionary from package [{}] locale [{}]", packageId, locale), + e + ); + throw e; } finally { - IOUtils.close(affixStream); - IOUtils.close(dicStreams); + if (!success) { + IOUtils.close(affixStream); + IOUtils.close(dicStreams); + } } Suggestion importance[1-10]: 5 __ Why: The concern about potential double-closing of streams is valid if `Dictionary` closes streams internally on success. Using a `success` flag to conditionally close streams in `finally` is a reasonable defensive pattern, though the actual impact depends on `Dictionary`'s implementation.	Low
Security	Strengthen path traversal detection to cover embedded dots The check `value.startsWith(".")` already covers `value.equals(".")` and `value.equals("..")`, and `value.contains("./")` / `value.contains("../")` are subsumed by the path separator check below. More importantly, the regex-like strings `"\."` and `"\.\."` are checking for literal backslash-dot sequences, not regex escapes—this is correct for `String.contains`, but the backslash check is already handled by the path separator rejection. The redundant conditions add confusion without extra safety. More critically, a value like `"foo..bar"` (double dot embedded) is not caught by any of these checks but could still be problematic depending on OS path resolution. Consider normalizing and checking the resolved path instead. server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java [160-167] -if (value.equals(".") - \|\| value.equals("..") - \|\| value.contains("./") - \|\| value.contains("../") - \|\| value.contains("\\.") - \|\| value.contains("\\..") - \|\| value.startsWith(".") - \|\| value.endsWith(".")) { +if (value.startsWith(".") \|\| value.endsWith(".") \|\| value.contains("..")) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Invalid %s: [%s]. Path traversal sequences (., ..) are not allowed.", paramName, value) + ); +} Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that `value.contains("..")` would catch embedded double-dot sequences like `"foo..bar"` that the current checks miss. The simplified logic is cleaner and more comprehensive for path traversal prevention, which is a security concern.	Low
General	Remove duplicate changelog entries There are two identical changelog entries with different PR numbers. These appear to be duplicate entries describing the same feature. Consider consolidating them into a single entry or differentiating the descriptions to accurately reflect what each PR contributes. CHANGELOG.md [42-43] -- Add ref_path support for package-based hunspell dictionary loading and cache invalidation API ([#20741](https://github.com/opensearch-project/OpenSearch/pull/20741)) -- Add ref_path support for package-based hunspell dictionary loading and cache invalidation API ([#20792](https://github.com/opensearch-project/OpenSearch/pull/20792)) +- Add ref_path support for package-based hunspell dictionary loading ([#20741](https://github.com/opensearch-project/OpenSearch/pull/20741)) +- Add cache invalidation API for package-based hunspell dictionary with ref_path support ([#20792](https://github.com/opensearch-project/OpenSearch/pull/20792)) Suggestion importance[1-10]: 6 __ Why: The two changelog entries have identical descriptions but different PR numbers, which is confusing. Differentiating them to accurately reflect each PR's contribution would improve clarity, though this is a minor documentation issue.	Low

Previous suggestions

Suggestions up to commit 190bfd1

Category	Suggestion	Impact
General	Fix inaccurate concurrent removal count In a concurrent environment, `sizeBefore - dictionaries.size()` can yield a negative or incorrect count because other threads may add or remove entries between the two `size()` calls. Use an atomic counter incremented inside the `removeIf` predicate to get an accurate count. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [509-511] -int sizeBefore = dictionaries.size(); -dictionaries.keySet().removeIf(key -> key.startsWith(prefix)); -int count = sizeBefore - dictionaries.size(); +java.util.concurrent.atomic.AtomicInteger count = new java.util.concurrent.atomic.AtomicInteger(0); +dictionaries.keySet().removeIf(key -> { + if (key.startsWith(prefix)) { + count.incrementAndGet(); + return true; + } + return false; +}); +int removedCount = count.get(); Suggestion importance[1-10]: 7 __ Why: The `sizeBefore - dictionaries.size()` approach is genuinely racy in a `ConcurrentHashMap` and can yield incorrect (even negative) counts. Using an `AtomicInteger` inside the `removeIf` predicate is the correct fix and directly addresses a real concurrency bug.	Medium
	Deduplicate identical changelog entries There are two identical changelog entries with different PR numbers. These should have distinct descriptions to differentiate what each PR contributes. Having duplicate descriptions makes the changelog misleading and harder to understand. CHANGELOG.md [42-43] -- Add ref_path support for package-based hunspell dictionary loading and cache invalidation API ([#20741](https://github.com/opensearch-project/OpenSearch/pull/20741)) -- Add ref_path support for package-based hunspell dictionary loading and cache invalidation API ([#20792](https://github.com/opensearch-project/OpenSearch/pull/20792)) +- Add ref_path support for package-based hunspell dictionary loading ([#20741](https://github.com/opensearch-project/OpenSearch/pull/20741)) +- Add cache invalidation API support for package-based hunspell dictionary with ref_path ([#20792](https://github.com/opensearch-project/OpenSearch/pull/20792)) Suggestion importance[1-10]: 6 __ Why: The two changelog entries have identical descriptions but different PR numbers, which is misleading. Differentiating the descriptions would improve clarity, though the exact correct descriptions depend on what each PR actually implements.	Low
	Fix repeated disk reads on failed dictionary load `ConcurrentHashMap.computeIfAbsent` does not allow the mapping function to throw checked exceptions, but it also caches failed computations differently across JVM versions. If `loadDictionaryFromPackage` throws, the `IllegalStateException` is propagated but the key may remain absent, causing repeated expensive disk reads on every subsequent call. Consider caching a sentinel or using a separate loading mechanism to avoid repeated failures. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [180-190] -return dictionaries.computeIfAbsent(cacheKey, (key) -> { - try { - return loadDictionaryFromPackage(packageId, locale); - } catch (Exception e) { +Dictionary cached = dictionaries.get(cacheKey); +if (cached != null) { + return cached; +} +try { + Dictionary loaded = loadDictionaryFromPackage(packageId, locale); + Dictionary existing = dictionaries.putIfAbsent(cacheKey, loaded); + return existing != null ? existing : loaded; +} catch (Exception e) { + throw new IllegalStateException( + String.format(Locale.ROOT, "Failed to load hunspell dictionary for package [%s] locale [%s]", packageId, locale), + e + ); +} - throw new IllegalStateException( - String.format(Locale.ROOT, "Failed to load hunspell dictionary for package [%s] locale [%s]", packageId, locale), - e - ); - } -}); - Suggestion importance[1-10]: 5 __ Why: The concern about repeated disk reads on failure is valid, but `ConcurrentHashMap.computeIfAbsent` in Java 8+ does not cache failed computations (the key remains absent on exception), so every failed call will retry the disk read. The suggested alternative using `get`/`putIfAbsent` is a reasonable improvement for correctness in concurrent scenarios.	Low
Possible issue	Prevent silent null binding for required service If `hunspellService` is null, `TransportHunspellCacheInvalidateAction` and `TransportHunspellCacheInfoAction` will fail at runtime with a Guice injection error since they require `HunspellService`. The null check silently skips binding, which will cause an obscure failure rather than a clear error. Either always bind the service or throw an explicit exception when it's null. server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/ActionModule.java [1129-1131] -if (hunspellService != null) { - bind(HunspellService.class).toInstance(hunspellService); +if (hunspellService == null) { + throw new IllegalStateException("HunspellService must not be null when registering hunspell cache actions"); } +bind(HunspellService.class).toInstance(hunspellService); Suggestion importance[1-10]: 6 __ Why: The null check silently skips binding `HunspellService`, which would cause an obscure Guice injection error at runtime when `TransportHunspellCacheInvalidateAction` or `TransportHunspellCacheInfoAction` are invoked. However, the test code explicitly passes `null` for `hunspellService`, suggesting the null case is intentional for test scenarios, so throwing unconditionally could break tests.	Low
Security	Fix incomplete path traversal detection The path traversal check is incomplete: it misses cases like `foo/..` (ends with `..`), `foo/../bar` (contains `/../`), and Windows-style `foo..` sequences. Since path separators are already rejected in the next check, the traversal check is partially redundant but still incomplete for edge cases like a value that is exactly `..`. Rely on the path separator check as the primary defense and simplify the traversal check to cover the remaining cases. server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java [160-167] -if (value.equals(".") - \|\| value.equals("..") - \|\| value.contains("./") - \|\| value.contains("../") - \|\| value.contains("\\.") - \|\| value.contains("\\..") - \|\| value.startsWith(".") - \|\| value.endsWith(".")) { +if (value.equals(".") \|\| value.equals("..") \|\| value.startsWith(".") \|\| value.endsWith(".") \|\| value.contains("..")) { Suggestion importance[1-10]: 6 __ Why: The existing traversal check misses patterns like `foo/..` or `foo/../bar`, though the subsequent path separator check (`/` and `\`) provides a strong secondary defense. The simplified check `value.contains("..")` would catch more cases, but since path separators are already rejected, the remaining gap is limited to edge cases like a bare `..` value which is already covered.	Low

Suggestions up to commit dbc150a

Category	Suggestion	Impact
General	Fix inaccurate concurrent removal count `dictionaries` is a `ConcurrentHashMap`, and `removeIf` on its `keySet()` is not atomic with respect to `size()`. Between `sizeBefore = dictionaries.size()` and the `removeIf`, new entries matching the prefix could be added or removed by concurrent threads, making `count` inaccurate. While the comment acknowledges approximate counts for `invalidateAllDictionaries`, this method returns the count as a meaningful result used by callers. Consider using an atomic counter within the `removeIf` predicate instead. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [509-511] -int sizeBefore = dictionaries.size(); -dictionaries.keySet().removeIf(key -> key.startsWith(prefix)); -int count = sizeBefore - dictionaries.size(); +java.util.concurrent.atomic.AtomicInteger count = new java.util.concurrent.atomic.AtomicInteger(0); +dictionaries.keySet().removeIf(key -> { + if (key.startsWith(prefix)) { + count.incrementAndGet(); + return true; + } + return false; +}); +int removed = count.get(); Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies a real concurrency issue where `sizeBefore` and `removeIf` are not atomic, making the returned `count` potentially inaccurate. Using an `AtomicInteger` within the predicate is a valid fix that provides a more accurate count of actually removed entries.	Low
	Deduplicate identical changelog entries There are two identical changelog entries describing the same feature but referencing different PR numbers. These should be deduplicated or differentiated to accurately describe what each PR contributes. Having duplicate descriptions makes the changelog misleading and harder to understand. CHANGELOG.md [40-41] -- Add ref_path support for package-based hunspell dictionary loading and cache invalidation API ([#20741](https://github.com/opensearch-project/OpenSearch/pull/20741)) -- Add ref_path support for package-based hunspell dictionary loading and cache invalidation API ([#20792](https://github.com/opensearch-project/OpenSearch/pull/20792)) +- Add ref_path support for package-based hunspell dictionary loading ([#20741](https://github.com/opensearch-project/OpenSearch/pull/20741)) +- Add cache invalidation API for package-based hunspell dictionary with ref_path support ([#20792](https://github.com/opensearch-project/OpenSearch/pull/20792)) Suggestion importance[1-10]: 6 __ Why: The two changelog entries have identical descriptions but reference different PR numbers (#20741 and #20792). This is misleading as it suggests both PRs do exactly the same thing. Differentiating the descriptions would improve clarity and accuracy of the changelog.	Low
	Validate REST parameters before building invalidate request The `buildInvalidateRequest` method is called for all POST requests, including those to `/_invalidate_all`. However, if a client sends a POST to `/_invalidate_all` with `package_id` or `cache_key` query parameters, those are silently ignored. More importantly, the request validation in `HunspellCacheInvalidateRequest.validate()` would reject a request where `invalidateAll=true` AND `packageId != null`, but since the REST handler ignores params for `_invalidate_all`, this conflict is masked rather than properly handled. This is acceptable behavior but should be explicitly documented, or params should be rejected with a 400 error. server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java [106-116] if (request.path().endsWith("/_invalidate_all")) { invalidateRequest.setInvalidateAll(true); + // Ignore any package_id/cache_key params for _invalidate_all endpoint } else { String packageId = request.param("package_id"); String locale = request.param("locale"); String cacheKey = request.param("cache_key"); + + if (packageId == null && locale == null && cacheKey == null) { + throw new IllegalArgumentException( + "At least one of 'package_id', 'cache_key' must be specified for /_invalidate" + ); + } invalidateRequest.setPackageId(packageId); invalidateRequest.setLocale(locale); invalidateRequest.setCacheKey(cacheKey); } Suggestion importance[1-10]: 3 __ Why: The suggestion points out that missing parameters for `/_invalidate` are silently passed through to validation, but the `HunspellCacheInvalidateRequest.validate()` already handles this case and returns a proper validation error. Adding early validation in the REST handler is redundant and the `improved_code` throws an unchecked exception rather than returning a proper 400 response.	Low
Possible issue	Prevent silent injection failure for null service If `hunspellService` is null, `TransportHunspellCacheInvalidateAction` and `TransportHunspellCacheInfoAction` will fail at injection time with a cryptic error. The binding should always be present, or the transport actions should handle a null/optional service gracefully. Consider throwing an explicit error or ensuring `hunspellService` is never null when these actions are registered. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [1129-1131] -if (hunspellService != null) { - bind(HunspellService.class).toInstance(hunspellService); +if (hunspellService == null) { + throw new IllegalStateException("HunspellService must not be null when hunspell cache actions are registered"); } +bind(HunspellService.class).toInstance(hunspellService); Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that a null `hunspellService` would cause a cryptic injection failure for `TransportHunspellCacheInvalidateAction` and `TransportHunspellCacheInfoAction`. However, the `existing_code` snippet is from `ActionModule.java` not `HunspellService.java` as stated in the suggestion's `relevant_file`. The logic is valid but the impact is moderate since tests already pass `null` explicitly.	Low
Security	Fix incorrect path traversal detection logic The path traversal check is incomplete and redundant. The `value.startsWith(".")` condition already covers `value.equals(".")` and `value.equals("..")`, but sequences like `foo/..` or `foo/../bar` are not caught here — they are only caught by the path separator check. However, a value like `foo..bar` would incorrectly be rejected by `value.endsWith(".")` only if it ends with a dot. More critically, `value.contains("\.")` uses a regex-style escape but `String.contains` does literal matching, so `\.` matches the literal two-character sequence `.` rather than a dot — this is likely a bug. Simplify and fix the check. server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java [160-167] if (value.equals(".") \|\| value.equals("..") + \|\| value.startsWith(".") + \|\| value.endsWith(".") \|\| value.contains("./") \|\| value.contains("../") - \|\| value.contains("\\.") - \|\| value.contains("\\..") - \|\| value.startsWith(".") - \|\| value.endsWith(".")) { + \|\| value.contains("\\.") // literal backslash-dot + \|\| value.contains("\\..")) { // literal backslash-dot-dot Suggestion importance[1-10]: 4 __ Why: The suggestion raises a valid point about `value.contains("\\\\.")` using a literal string match for `\\.` rather than a backslash followed by a dot, but the `improved_code` is essentially the same as the `existing_code` with only comment changes, making it not a meaningful fix. The actual bug identified is real but the proposed fix doesn't address it properly.	Low

Suggestions up to commit 33f4a54

Category	Suggestion	Impact
Possible issue	Avoid blocking inside computeIfAbsent on load failure `ConcurrentHashMap.computeIfAbsent` must not throw checked or unchecked exceptions that leave a partial mapping; if the supplier throws, the key is not stored, but a concurrent caller may already be inside `computeIfAbsent` for the same key and will block indefinitely on some JVM versions. Wrap the call outside `computeIfAbsent` and use a two-step get-then-compute pattern to avoid this hazard. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [180-190] -return dictionaries.computeIfAbsent(cacheKey, (key) -> { - try { - return loadDictionaryFromPackage(packageId, locale); - } catch (Exception e) { +Dictionary existing = dictionaries.get(cacheKey); +if (existing != null) { + return existing; +} +Dictionary loaded; +try { + loaded = loadDictionaryFromPackage(packageId, locale); +} catch (Exception e) { + throw new IllegalStateException( + String.format(Locale.ROOT, "Failed to load hunspell dictionary for package [%s] locale [%s]", packageId, locale), + e + ); +} +Dictionary previous = dictionaries.putIfAbsent(cacheKey, loaded); +return previous != null ? previous : loaded; - throw new IllegalStateException( - String.format(Locale.ROOT, "Failed to load hunspell dictionary for package [%s] locale [%s]", packageId, locale), - e - ); - } -}); - Suggestion importance[1-10]: 6 __ Why: This is a valid concern - `ConcurrentHashMap.computeIfAbsent` can cause issues when the mapping function throws exceptions, potentially causing threads to block. The proposed get-then-putIfAbsent pattern is a safer alternative, though it may load the dictionary twice under high concurrency.	Low
Possible issue	Prevent null HunspellService injection failure If `hunspellService` is null, `TransportHunspellCacheInvalidateAction` and `TransportHunspellCacheInfoAction` will fail at injection time with a cryptic error. Either make `hunspellService` a required parameter (non-null) or provide a fallback binding so the injector always has a binding for `HunspellService`. server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/ActionModule.java [1129-1131] -if (hunspellService != null) { - bind(HunspellService.class).toInstance(hunspellService); +if (hunspellService == null) { + throw new IllegalArgumentException("HunspellService must not be null"); } +bind(HunspellService.class).toInstance(hunspellService); Suggestion importance[1-10]: 5 __ Why: The suggestion is valid in principle - if `hunspellService` is null, the transport actions that depend on it will fail at injection time. However, the test code explicitly passes `null` for `hunspellService` in `ActionModuleTests`, suggesting null is an intentional supported state. The relevant file in the suggestion is wrong (`ActionModule.java` vs the actual file path), but the code lines match the `ActionModule.java` content in the diff.	Low
General	Deduplicate identical changelog entries There are two identical changelog entries describing the same feature but referencing different PR numbers. These should be deduplicated or differentiated to accurately describe what each PR contributes. Having duplicate entries with the same description is misleading and clutters the changelog. CHANGELOG.md [40-41] -- Add ref_path support for package-based hunspell dictionary loading and cache invalidation API ([#20741](https://github.com/opensearch-project/OpenSearch/pull/20741)) -- Add ref_path support for package-based hunspell dictionary loading and cache invalidation API ([#20792](https://github.com/opensearch-project/OpenSearch/pull/20792)) +- Add ref_path support for package-based hunspell dictionary loading ([#20741](https://github.com/opensearch-project/OpenSearch/pull/20741)) +- Add cache invalidation API for package-based hunspell dictionary with ref_path support ([#20792](https://github.com/opensearch-project/OpenSearch/pull/20792)) Suggestion importance[1-10]: 6 __ Why: The two changelog entries have identical descriptions but different PR numbers, which is misleading. Differentiating them to accurately describe each PR's contribution improves changelog clarity and accuracy.	Low
General	Fix inaccurate concurrent removal count `ConcurrentHashMap.keySet().removeIf()` is not atomic, so `sizeBefore` and the post-removal size can be inconsistent under concurrent access, leading to a negative or incorrect `count`. Collect the removed keys explicitly to get an accurate count. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [509-511] -int sizeBefore = dictionaries.size(); -dictionaries.keySet().removeIf(key -> key.startsWith(prefix)); -int count = sizeBefore - dictionaries.size(); +int[] count = {0}; +dictionaries.keySet().removeIf(key -> { + if (key.startsWith(prefix)) { + count[0]++; + return true; + } + return false; +}); +int result = count[0]; Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies a race condition between `size()` and `removeIf()` in `ConcurrentHashMap`. The improved code using an atomic counter inside `removeIf` is more accurate. However, the code itself already acknowledges approximate counts in similar methods, and the impact is limited to logging accuracy.	Low
Security	Simplify and close path traversal validation gaps The check `value.startsWith(".")` already covers `value.equals(".")` and `value.equals("..")`, and `value.contains("./")` / `value.contains("../")` are subsumed by the path-separator check below. More importantly, a value like `"foo..bar"` passes all these checks but still resolves to a path traversal on some filesystems. Rely on the server-side `normalize()` + `startsWith()` check already present in `loadDictionaryFromPackage` and simplify this to just reject separators, null bytes, and leading/trailing dots. server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java [160-171] -if (value.equals(".") - \|\| value.equals("..") - \|\| value.contains("./") - \|\| value.contains("../") - \|\| value.contains("\\.") - \|\| value.contains("\\..") - \|\| value.startsWith(".") - \|\| value.endsWith(".")) { +if (value.startsWith(".") \|\| value.endsWith(".") \|\| value.contains("..")) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Invalid %s: [%s]. Path traversal sequences (., ..) are not allowed.", paramName, value) + ); +} Suggestion importance[1-10]: 4 __ Why: The suggestion correctly notes redundancy in the path traversal checks (e.g., `startsWith(".")` covers `equals(".")` and `equals("..")`). However, the server-side `normalize()` + `startsWith()` check in `loadDictionaryFromPackage` already provides a strong safety net, making this a minor improvement rather than a critical fix.	Low

Suggestions up to commit 3917262

Category	Suggestion	Impact
Possible issue	Prevent silent injection failure for null service If `hunspellService` is null, `TransportHunspellCacheInvalidateAction` and `TransportHunspellCacheInfoAction` will fail at injection time with a cryptic error. The binding should always be present, or the transport actions should handle a null/optional service gracefully. Consider throwing an explicit error or ensuring `hunspellService` is never null when these actions are registered. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [1129-1131] -if (hunspellService != null) { - bind(HunspellService.class).toInstance(hunspellService); +if (hunspellService == null) { + throw new IllegalStateException("HunspellService must not be null when hunspell cache actions are registered"); } +bind(HunspellService.class).toInstance(hunspellService); Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that a null `hunspellService` would cause a cryptic injection failure for `TransportHunspellCacheInvalidateAction` and `TransportHunspellCacheInfoAction`. However, the existing code in `ActionModule.java` (not `HunspellService.java` as stated in the suggestion) handles this with a null check, and the test code explicitly passes `null` for `hunspellService`, suggesting null is an intentional valid state. The suggestion's `relevant_file` is also incorrect.	Low
Possible issue	Fix non-atomic removal count in concurrent map Using `sizeBefore - dictionaries.size()` on a `ConcurrentHashMap` is not atomic and can yield incorrect counts in concurrent scenarios. While the comment acknowledges this for `invalidateAllDictionaries`, here the count is used as a return value that callers may rely on for correctness. Consider tracking removals directly inside `removeIf` using an `AtomicInteger`. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [509-511] -int sizeBefore = dictionaries.size(); -dictionaries.keySet().removeIf(key -> key.startsWith(prefix)); -int count = sizeBefore - dictionaries.size(); +java.util.concurrent.atomic.AtomicInteger count = new java.util.concurrent.atomic.AtomicInteger(0); +dictionaries.keySet().removeIf(key -> { + if (key.startsWith(prefix)) { + count.incrementAndGet(); + return true; + } + return false; +}); +int removed = count.get(); +// use removed instead of count Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies a race condition in `invalidateDictionariesByPackage` where `sizeBefore - dictionaries.size()` is non-atomic on a `ConcurrentHashMap`. Using an `AtomicInteger` inside `removeIf` is a valid fix for more accurate counting in concurrent scenarios.	Low
Security	Fix incomplete path traversal detection The path traversal check is incomplete and can be bypassed. For example, a value like `foo/..` or `..foo` would not be caught by the current conditions. Since path separators (`/`, `</code>) are already rejected in a subsequent check, the dot-sequence checks are partially redundant but still miss edge cases. The most reliable approach is to normalize the value as a path and verify it has no parent components.` server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java [160-167] -if (value.equals(".") - \|\| value.equals("..") - \|\| value.contains("./") - \|\| value.contains("../") - \|\| value.contains("\\.") - \|\| value.contains("\\..") - \|\| value.startsWith(".") - \|\| value.endsWith(".")) { +// Reject path traversal attempts - check after rejecting separators +if (value.equals(".") \|\| value.equals("..") \|\| value.contains("..")) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Invalid %s: [%s]. Path traversal sequences (., ..) are not allowed.", paramName, value) + ); +} Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that the path traversal check has gaps (e.g., ..foo or foo/.. patterns). The improved code simplifies the check to value.contains("..") which is more comprehensive, and since path separators are already rejected separately, this is a valid security improvement.	Low
General	Remove misleading empty comment in exception handler `ConcurrentHashMap.computeIfAbsent` must not throw checked exceptions, but wrapping in `IllegalStateException` is correct. However, if the loading throws an unchecked exception, the key will NOT be stored in the map, but the partially-executed lambda may leave resources open. More critically, if `loadDictionaryFromPackage` throws, the exception propagates out of `computeIfAbsent` without caching, so every subsequent call retries the load — this is acceptable, but the empty catch block comment should be removed to avoid confusion. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [180-190] return dictionaries.computeIfAbsent(cacheKey, (key) -> { try { return loadDictionaryFromPackage(packageId, locale); } catch (Exception e) { - throw new IllegalStateException( String.format(Locale.ROOT, "Failed to load hunspell dictionary for package [%s] locale [%s]", packageId, locale), e ); } }); Suggestion importance[1-10]: 3 __ Why: The suggestion correctly identifies an empty line/comment between `catch (Exception e) {` and the `throw` statement that could be confusing. The `improved_code` and `existing_code` are functionally equivalent with only whitespace differences, making this a minor style improvement.	Low
	Remove trailing whitespace in affix rules The trailing whitespace after `y` and `[^ey]` in these SFX rules may cause parsing issues with some Hunspell implementations that are strict about line endings. Remove the trailing spaces to ensure consistent behavior across different Hunspell parsers. server/src/test/resources/indices/analyze/conf_dir/packages/test-pkg/hunspell/en_US/en_US.aff [44-45] -SFX N y ication y +SFX N y ication y SFX N 0 en [^ey] Suggestion importance[1-10]: 3 __ Why: The suggestion correctly identifies trailing whitespace in the `SFX N` rules at lines 44-45. While trailing whitespace could theoretically cause parsing issues in strict Hunspell implementations, this is a test resource file and the impact is minimal. The `improved_code` accurately reflects the change.	Low
	Remove trailing whitespace in suffix rule Similar to the other SFX rules, this line has trailing whitespace after `[^e]` which could cause parsing inconsistencies. Remove the trailing space for consistency and to avoid potential issues. server/src/test/resources/indices/analyze/conf_dir/packages/test-pkg/hunspell/en_US/en_US.aff [61] Suggestion importance[1-10]: 3 __ Why: The suggestion correctly identifies trailing whitespace in the `SFX G` rule at line 61. Similar to suggestion 1, this is a minor style issue in a test resource file with minimal practical impact.	Low

github-actions · 2026-03-06T18:46:11Z

Persistent review updated to latest commit 33f4a54

github-actions · 2026-03-06T20:16:36Z

❌ Gradle check result for 33f4a54: FAILURE