feat(analysis): Add package-based Hunspell dictionary support with ref_path parameter and cache invalidation API by shayush622 · Pull Request #20741 · opensearch-project/OpenSearch

shayush622 · 2026-02-27T11:18:47Z

Add INDEX_REF_PATH_SETTING for package-based hunspell dictionaries
Add RestHunspellCacheInvalidateAction for cache invalidation endpoint
Update HunspellService with cache management methods
Add ref_path validation in MetadataCreateIndexService

Description

This PR adds support for loading Hunspell dictionaries from package-based directories using a new ref_path parameter, enabling multi-tenant dictionary isolation and hot-reload capabilities.

Related Issues

Resolves #[20712]
Link to RFC.

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-02-27T11:19:17Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 0e293c9.

Path	Line	Severity	Description
server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java	1	high	A new unauthenticated REST endpoint (/_hunspell/cache/_invalidate_all) is registered directly in Node.java with a comment explicitly stating it is 'for hunspell invalidation cache testing'. This endpoint exposes cache management operations (including full cache wipe) via HTTP without any authorization check or role-based access control, and bypasses the normal plugin/action registration framework. An attacker with network access to the node could repeatedly invalidate all dictionaries causing denial-of-service or resource exhaustion. The 'testing' comment strongly suggests this was not intended for production but was left in deliberately.
server/src/main/java/org/opensearch/node/Node.java	1676	medium	The REST handler registration is done inline in Node bootstrap outside of the standard REST action plugin registration path, bypassing any framework-level security enforcement (e.g., transport-layer action filters, permission checks). The inline comment '// Add for hunspell invalidation cache testing:' signals this is debug/test infrastructure committed to production code, which is a suspicious pattern that could be a backdoor or intentional weakening of the security boundary around cache management.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 1 | Medium: 1 | Low: 0

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Copilot

Pull request overview

This PR adds support for loading Hunspell dictionaries from package-based directories using a new ref_path parameter, enabling multi-tenant dictionary isolation and hot-reload capabilities through a new cache management REST API.

Changes:

Introduces package-based dictionary loading with ref_path parameter alongside traditional locale-based loading
Adds REST API endpoints for Hunspell cache management (view, invalidate by package/key, invalidate all)
Implements cache invalidation methods in HunspellService for hot-reload support
Adds index.ref_path index-level setting with validation (though usage is unclear)

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java	Adds getDictionaryFromPackage(), cache invalidation methods, and utility methods for package-based dictionary management
server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java	New REST handler for cache management with GET/POST endpoints
server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java	Updates to support ref_path parameter and hot-reload via updateable flag
server/src/main/java/org/opensearch/index/IndexSettings.java	Adds INDEX_REF_PATH_SETTING as dynamic index-level setting
server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java	Adds validateRefPath() for index.ref_path validation
server/src/main/java/org/opensearch/node/Node.java	Registers RestHunspellCacheInvalidateAction
server/src/main/java/org/opensearch/indices/analysis/AnalysisModule.java	Makes getHunspellService() public and adds Environment parameter to HunspellTokenFilterFactory
server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java	Registers INDEX_REF_PATH_SETTING
server/src/test/java/org/opensearch/indices/analyze/HunspellServiceTests.java	Adds comprehensive tests for package-based dictionary loading and cache management
mise.toml	Adds development environment configuration (unrelated to PR)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T11:27:50Z

server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java

+                if (!refPath.startsWith("packages/")) {
+                    validationErrors.add("ref_path [" + refPath + "] must start with 'packages/'");
+                    return validationErrors;


The ref_path validation expects the value to start with "packages/" (e.g., "packages/pkg-1234"), but HunspellTokenFilterFactory at line 87 expects ref_path to be just the package ID (e.g., "pkg-1234") without the "packages/" prefix. This creates an inconsistency where the validation will reject valid usage patterns.

Looking at HunspellService.loadDictionaryFromPackage (lines 199-201), it constructs the path as config/packages/{packageId}, meaning it expects just the package ID, not the full path.

The validation should either:

Accept just the package ID (e.g., "pkg-1234") to match the usage, or

Update HunspellTokenFilterFactory and HunspellService to expect the full path with "packages/" prefix

Option 1 is recommended for consistency with the implementation and to avoid confusion for users.

Copilot · 2026-02-27T11:27:51Z

server/src/main/java/org/opensearch/node/Node.java

+            // Add for hunspell invalidation cache testing:
+            HunspellService hunspellService = analysisModule.getHunspellService();
+            restController.registerHandler(new RestHunspellCacheInvalidateAction(hunspellService));
+


REST handlers should typically be registered in ActionModule's constructor using the standard pattern with registerHandler.accept(), not directly in Node.java. This breaks the established pattern where all REST handlers are registered in one central location in ActionModule (around lines 850-1077 in ActionModule.java).

Consider moving this registration to ActionModule.java to maintain consistency with other REST handlers in the codebase. The HunspellService can be passed as a parameter to RestHunspellCacheInvalidateAction or retrieved from the AnalysisModule if needed.

Suggested change

// Add for hunspell invalidation cache testing:

HunspellService hunspellService = analysisModule.getHunspellService();

restController.registerHandler(new RestHunspellCacheInvalidateAction(hunspellService));

^ Let's address this.

ACK, addressed

Copilot · 2026-02-27T11:27:51Z

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java

+        Path packageDir = env.configDir()
+            .resolve("packages")
+            .resolve(packageId);
+
+        // Security check: ensure path is under config directory
+        if (!packageDir.normalize().startsWith(env.configDir().toAbsolutePath())) {
+            throw new IllegalArgumentException(
+                String.format(Locale.ROOT, "Package path must be under config directory. Package: [%s]", packageId)
+            );
+        }
+
+        // Check if package directory exists
+        if (!Files.isDirectory(packageDir)) {
+            throw new OpenSearchException(
+                String.format(Locale.ROOT, 
+                    "Package directory not found: [%s]. Expected at: %s", 
+                    packageId, packageDir)
+            );
+        }
+
+        // Auto-detect hunspell directory within package
+        Path hunspellDir = packageDir.resolve("hunspell");
+        if (!Files.isDirectory(hunspellDir)) {
+            throw new OpenSearchException(
+                String.format(Locale.ROOT, 
+                    "Hunspell directory not found in package [%s]. " +
+                    "Expected 'hunspell' subdirectory at: %s", 
+                    packageId, hunspellDir)
+            );
+        }
+
+        // Resolve locale directory within hunspell
+        Path dicDir = hunspellDir.resolve(locale);


The path traversal check at line 204 is not entirely correct. The check compares packageDir.normalize().startsWith(env.configDir().toAbsolutePath()), but packageDir is not converted to absolute path before normalization. This could allow path traversal attacks with inputs like "../../../etc" because:

packageDir is relative (config/packages/../../../etc)

After normalize(), it becomes a different relative path

The startsWith check may fail to detect the traversal

The check should be: if (!packageDir.toAbsolutePath().normalize().startsWith(env.configDir().toAbsolutePath().normalize())) to ensure both paths are absolute and normalized before comparison.

Additionally, the locale parameter (line 231: dicDir = hunspellDir.resolve(locale)) is not validated for path traversal. A malicious locale value like "../../../etc/passwd" could escape the hunspell directory. Add similar validation for the locale parameter.

Suggested change

Path packageDir = env.configDir()

.resolve("packages")

.resolve(packageId);

// Security check: ensure path is under config directory

if (!packageDir.normalize().startsWith(env.configDir().toAbsolutePath())) {

throw new IllegalArgumentException(

String.format(Locale.ROOT, "Package path must be under config directory. Package: [%s]", packageId)

);

}

// Check if package directory exists

if (!Files.isDirectory(packageDir)) {

throw new OpenSearchException(

String.format(Locale.ROOT,

"Package directory not found: [%s]. Expected at: %s",

packageId, packageDir)

);

}

// Auto-detect hunspell directory within package

Path hunspellDir = packageDir.resolve("hunspell");

if (!Files.isDirectory(hunspellDir)) {

throw new OpenSearchException(

String.format(Locale.ROOT,

"Hunspell directory not found in package [%s]. " +

"Expected 'hunspell' subdirectory at: %s",

packageId, hunspellDir)

);

}

// Resolve locale directory within hunspell

Path dicDir = hunspellDir.resolve(locale);

Path configDir = env.configDir().toAbsolutePath().normalize();

Path packageDir = configDir

.resolve("packages")

.resolve(packageId);

Path packageDirAbs = packageDir.toAbsolutePath().normalize();

// Security check: ensure path is under config directory

if (!packageDirAbs.startsWith(configDir)) {

throw new IllegalArgumentException(

String.format(Locale.ROOT, "Package path must be under config directory. Package: [%s]", packageId)

);

}

// Check if package directory exists

if (!Files.isDirectory(packageDirAbs)) {

throw new OpenSearchException(

String.format(Locale.ROOT,

"Package directory not found: [%s]. Expected at: %s",

packageId, packageDirAbs)

);

}

// Auto-detect hunspell directory within package

Path hunspellDir = packageDirAbs.resolve("hunspell");

Path hunspellDirAbs = hunspellDir.toAbsolutePath().normalize();

if (!Files.isDirectory(hunspellDirAbs)) {

throw new OpenSearchException(

String.format(Locale.ROOT,

"Hunspell directory not found in package [%s]. " +

"Expected 'hunspell' subdirectory at: %s",

packageId, hunspellDirAbs)

);

}

// Resolve locale directory within hunspell and validate against traversal

Path dicDir = hunspellDirAbs.resolve(locale);

Path dicDirAbs = dicDir.toAbsolutePath().normalize();

if (!dicDirAbs.startsWith(hunspellDirAbs)) {

throw new IllegalArgumentException(

String.format(Locale.ROOT, "Locale path must be under hunspell directory. Package: [%s], locale: [%s]", packageId, locale)

);

}

dicDir = dicDirAbs;

Copilot · 2026-02-27T11:27:51Z

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java

+     * 
+     * @return count of invalidated cache entries
+     */
+    public int invalidateAllDictionaries() {
+        int count = dictionaries.size();
+        dictionaries.clear();
+        logger.info("Invalidated all {} cached hunspell dictionaries", count);


The invalidateAllDictionaries method has a race condition. Line 498 reads the size, then line 499 clears the map. Between these two operations, other threads could add or remove entries, making the returned count inaccurate. While this is a minor issue for a diagnostic return value, it could lead to confusion in logs.

Consider using int count = dictionaries.size(); dictionaries.clear(); return count; in a single atomic operation or accepting that the count may be approximate and documenting this behavior.

Suggested change

*

* @return count of invalidated cache entries

*/

public int invalidateAllDictionaries() {

int count = dictionaries.size();

dictionaries.clear();

logger.info("Invalidated all {} cached hunspell dictionaries", count);

* <p>

* Note: The returned count is based on the cache size observed just before clearing and may be

* approximate if other threads are concurrently adding or removing entries.

*

* @return approximate count of invalidated cache entries

*/

public int invalidateAllDictionaries() {

int count = dictionaries.size();

dictionaries.clear();

logger.info(

"Invalidated all cached hunspell dictionaries; previous observed cache size was {} (may be approximate due to concurrent updates)",

count

);

Copilot · 2026-02-27T11:27:51Z

server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java


 /**
 * The token filter factory for the hunspell analyzer
+ * * 


There's a double asterisk "* *" on line 46 which appears to be a formatting error in the Javadoc comment. Remove the extra asterisk to maintain proper documentation formatting.

Suggested change

* *

*

Copilot · 2026-02-27T11:27:53Z

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java

+        }
+
+        // Resolve locale directory within hunspell
+        Path dicDir = hunspellDir.resolve(locale);


The locale parameter is resolved to a directory path (line 231) without validation for path traversal. A malicious locale value like "../../../etc/passwd" or "../../.." could escape the hunspell directory and access arbitrary files on the system.

Add validation after line 231 to ensure the resolved dicDir is still under the hunspellDir:

if (!dicDir.toAbsolutePath().normalize().startsWith(hunspellDir.toAbsolutePath().normalize())) { throw new IllegalArgumentException( String.format(Locale.ROOT, "Locale path must be under hunspell directory. Locale: [%s]", locale) ); }

This prevents directory traversal attacks via the locale parameter.

Suggested change

Path dicDir = hunspellDir.resolve(locale);

Path dicDir = hunspellDir.resolve(locale);

if (!dicDir.toAbsolutePath().normalize().startsWith(hunspellDir.toAbsolutePath().normalize())) {

throw new IllegalArgumentException(

String.format(Locale.ROOT, "Locale path must be under hunspell directory. Locale: [%s]", locale)

);

}

...rc/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java

Copilot · 2026-02-27T11:27:53Z

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java

+     * @throws IllegalArgumentException if packageId or locale is null
+     * @throws OpenSearchException if hunspell directory not found or dictionary cannot be loaded
+     */
+    public Dictionary getDictionaryFromPackage(String packageId, String locale, Environment env) {


The getDictionaryFromPackage method requires an Environment parameter, but HunspellService already has access to the environment from its constructor (used to resolve hunspellDir at line 125). This creates inconsistency:

getDictionary() uses the environment from the constructor

getDictionaryFromPackage() requires environment to be passed as a parameter

This inconsistency is error-prone because:

Callers might pass a different Environment instance than the one used during construction

It adds unnecessary complexity to the API

Consider removing the env parameter from getDictionaryFromPackage and using the environment from the constructor, similar to how getDictionary() works. Store the environment as a field during construction if needed.

mise.toml

...rc/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java

- Add INDEX_REF_PATH_SETTING for package-based hunspell dictionaries - Add RestHunspellCacheInvalidateAction for cache invalidation endpoint - Update HunspellService with cache management methods - Add ref_path validation in MetadataCreateIndexService Signed-off-by: shayush622 <ayush5267@gmail.com>

github-actions · 2026-02-27T11:36:52Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 865aab5.

Path	Line	Severity	Description
server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java	1	high	An unauthenticated REST endpoint (/_hunspell/cache/_invalidate_all) is registered directly in Node.java outside the normal plugin/action registration framework with a comment explicitly labeling it 'for testing'. This endpoint allows any caller to evict all cached Hunspell dictionaries, enabling a denial-of-service by repeatedly flushing the cache and forcing expensive disk reloads. The registration bypasses the standard security and authorization checks applied to other admin endpoints, and the 'testing' comment suggests it was not intended for production but was left in.
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java	183	medium	The path traversal guard in loadDictionaryFromPackage compares packageDir.normalize() against env.configDir().toAbsolutePath(), but packageDir is built from env.configDir() (already absolute) then normalized, while the check uses toAbsolutePath() on the right-hand side only. If configDir is a relative path at construction time, startsWith() could fail to catch a traversal. A crafted packageId containing '../' sequences that survive the normalize() call (e.g., on certain filesystems) could still escape the config directory. This is in a code path triggered by user-supplied index settings.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 1 | Medium: 1 | Low: 0

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-02T15:09:05Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 865aab5.

Path	Line	Severity	Description
server/src/main/java/org/opensearch/node/Node.java	1676	high	Debug/testing REST endpoint registered in production with comment 'Add for hunspell invalidation cache testing'. Registered via restController.registerHandler() directly, bypassing the standard OpenSearch action/transport layer where authorization checks are enforced.
server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java	68	high	New unauthenticated REST endpoints (/_hunspell/cache/_invalidate_all, /_hunspell/cache/_invalidate) allow complete dictionary cache invalidation with no privilege verification. Implemented as a direct BaseRestHandler without a transport action, sidestepping OpenSearch standard authorization. Any HTTP caller can flush the entire cache or enumerate all cached package IDs and locales.
server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java	1756	medium	Path traversal in validateRefPath: only checks that refPath starts with the string 'packages/' but never verifies the normalized resolved path stays within configDir. A value like 'packages/../../../etc/' passes the string check. After resolve+normalize it escapes configDir, and directory existence is leaked via validation error messages. HunspellService correctly uses startsWith(configDir.toAbsolutePath()) but this method lacks that control.
server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java	110	medium	GET /_hunspell/cache returns all cached dictionary keys in plaintext to any caller, exposing internal package IDs and locale configurations (e.g., 'pkg-1234:en_US'). Discloses internal system topology and package deployment details without any access control.
server/src/main/java/org/opensearch/indices/analysis/AnalysisModule.java	119	low	getHunspellService() visibility changed from package-private to public without clear necessity beyond wiring the new test endpoint in Node.java. Unnecessarily expands the public API surface of HunspellService, enabling external manipulation of the service from any class in the codebase.

The table above displays the top 10 most important findings.

Total: 5 | Critical: 0 | High: 2 | Medium: 2 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

cwperks · 2026-03-02T17:54:00Z

server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java


    List<String> getIndexSettingsValidationErrors(final Settings settings, final boolean forbidPrivateIndexSettings, String indexName) {
        List<String> validationErrors = getIndexSettingsValidationErrors(settings, forbidPrivateIndexSettings, Optional.of(indexName));
+        validationErrors.addAll(validateRefPath(settings, env.configDir()));


nit: can we move this within getIndexSettingsValidationErrors to keep all validations in a single place?

cwperks · 2026-03-02T17:59:24Z

server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java

+        // Get both ref_path and locale parameters
+        String refPath = settings.get("ref_path");  // Package ID only (optional)
        String locale = settings.get("locale", settings.get("language", settings.get("lang", null)));
-        if (locale == null) {


Why only check this if ref_path is provided? Is this no longer always required?

Covered that scenario below, re-written the conditions; first we are checking for ref_path (an additional parameter) if it is present -> then we go for checking locale is present or not.
In the else part if ref-path is not there, we fall back to the original check of locale

Copilot

Pull request overview

Copilot reviewed 20 out of 22 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T13:19:37Z

...rc/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java

+    protected Set<String> responseParams() {
+        Set<String> params = new HashSet<>();
+        params.add("package_id");
+        params.add("cache_key");
+        params.add("locale");
+        return unmodifiableSet(params);
+    }
+
+    @Override


responseParams() is meant for response-format parameters (e.g., filter_path, human), not for request routing params. Including package_id/locale/cache_key here can mask unconsumed parameters (e.g., /_hunspell/cache/_invalidate_all?package_id=... would silently ignore package_id instead of failing strict param checks). Drop this override (or only include real response params) so invalid/unconsumed request parameters are rejected.

Suggested change

protected Set<String> responseParams() {

Set<String> params = new HashSet<>();

params.add("package_id");

params.add("cache_key");

params.add("locale");

return unmodifiableSet(params);

}

@Override

Copilot · 2026-03-03T13:19:37Z

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java

+        // Additional check: ensure the resolved package directory is exactly one level under packages/
+        // This prevents packageId=".." or "foo/../bar" from escaping
+        if (!packageDirAbsolute.getParent().equals(packagesBaseDirAbsolute)) {
+            throw new IllegalArgumentException(
+                String.format(Locale.ROOT, 
+                    "Invalid package ID: [%s]. Package ID cannot contain path traversal sequences.", packageId)
+            );
+        }


loadDictionaryFromPackage claims this check prevents packageId=".." or "foo/../bar" (line comment), but toAbsolutePath().normalize() will turn foo/../bar into bar, so the getParent().equals(...) check still passes and path segments are effectively accepted. If packageId is intended to be an ID (single path segment), add explicit validation rejecting path separators/traversal (and consider the same for locale) rather than relying on normalization + parent checks; also update the misleading comment.

github-actions · 2026-03-04T14:06:15Z

PR Reviewer Guide 🔍

(Review updated until commit `20ba2c0`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Package-based Hunspell dictionary loading with ref_path parameter Relevant files: server/src/main/java/org/opensearch/indices/analysis/HunspellService.java server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java server/src/test/java/org/opensearch/indices/analyze/HunspellServiceTests.java server/src/test/java/org/opensearch/index/analysis/HunspellTokenFilterFactoryTests.java server/src/test/resources/indices/analyze/conf_dir/packages/test-pkg/hunspell/en_US/en_US.aff Sub-PR theme: Hunspell cache invalidation REST API and transport actions Relevant files: server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateAction.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateRequest.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateResponse.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoAction.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoRequest.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoResponse.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/TransportHunspellCacheInvalidateAction.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/TransportHunspellCacheInfoAction.java server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/package-info.java server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java server/src/main/java/org/opensearch/action/ActionModule.java server/src/main/java/org/opensearch/node/Node.java server/src/main/java/org/opensearch/indices/analysis/AnalysisModule.java server/src/test/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateActionTests.java server/src/internalClusterTest/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheIT.java server/src/test/java/org/opensearch/action/ActionModuleTests.java server/src/test/java/org/opensearch/extensions/rest/RestSendToExtensionActionTests.java
⚡ Recommended focus areas for review Single-Node Scope The TransportHunspellCacheInvalidateAction extends HandledTransportAction and only executes on the coordinating node. In a multi-node cluster, each node has its own HunspellService with its own in-memory cache. Invalidating the cache on one node does not propagate to other nodes. This should likely use a broadcast/nodes action (e.g., extending TransportNodesAction) to invalidate the cache across all nodes. @Override protected void doExecute(Task task, HunspellCacheInvalidateRequest request, ActionListener<HunspellCacheInvalidateResponse> listener) { try { String packageId = request.getPackageId(); String locale = request.getLocale(); String cacheKey = request.getCacheKey(); boolean invalidateAll = request.isInvalidateAll(); int invalidatedCount = 0; String responseCacheKey = null; if (invalidateAll) { // Invalidate all cached dictionaries invalidatedCount = hunspellService.invalidateAllDictionaries(); } else if (packageId != null && locale != null) { // Invalidate specific package + locale combination responseCacheKey = HunspellService.buildPackageCacheKey(packageId, locale); boolean invalidated = hunspellService.invalidateDictionary(responseCacheKey); invalidatedCount = invalidated ? 1 : 0; } else if (packageId != null) { // Invalidate all locales for a package invalidatedCount = hunspellService.invalidateDictionariesByPackage(packageId); } else if (cacheKey != null) { // Invalidate a specific cache key directly responseCacheKey = cacheKey; boolean invalidated = hunspellService.invalidateDictionary(cacheKey); invalidatedCount = invalidated ? 1 : 0; } listener.onResponse(new HunspellCacheInvalidateResponse(true, invalidatedCount, packageId, locale, responseCacheKey)); } catch (Exception e) { listener.onFailure(e); } } Single-Node Scope Similar to the invalidate action, the cache info action only retrieves cache state from the coordinating node. In a multi-node cluster, each node has its own HunspellService cache. The response will not reflect the full cluster state. Consider using a nodes-level action to aggregate cache info across all nodes. protected void doExecute(Task task, HunspellCacheInfoRequest request, ActionListener<HunspellCacheInfoResponse> listener) { try { Set<String> cachedKeys = hunspellService.getCachedDictionaryKeys(); Set<String> packageKeys = new HashSet<>(); Set<String> localeKeys = new HashSet<>(); for (String key : cachedKeys) { if (HunspellService.isPackageCacheKey(key)) { packageKeys.add(key); } else { localeKeys.add(key); } } HunspellCacheInfoResponse response = new HunspellCacheInfoResponse( cachedKeys.size(), packageKeys.size(), localeKeys.size(), packageKeys, localeKeys ); listener.onResponse(response); } catch (Exception e) { listener.onFailure(e); } } Null HunspellService The HunspellService is conditionally bound only when non-null (guarded by `if (hunspellService != null)`). However, TransportHunspellCacheInvalidateAction and TransportHunspellCacheInfoAction both have @Inject constructors requiring HunspellService. If hunspellService is null, Guice injection will fail at runtime when these transport actions are invoked. The null guard should either be removed (always bind) or the transport actions should handle the null case gracefully. if (hunspellService != null) { bind(HunspellService.class).toInstance(hunspellService); } Race Condition in invalidateDictionariesByPackage The `invalidateDictionariesByPackage` method computes `sizeBefore` and then calls `removeIf`, but between these two operations in a concurrent environment, entries could be added or removed. The computed `count` (sizeBefore - dictionaries.size()) may be inaccurate. While the comment acknowledges this for `invalidateAllDictionaries`, the same caveat applies here but is not documented. More importantly, `removeIf` on a ConcurrentHashMap's keySet is not atomic, so concurrent loads could re-add entries during removal. public int invalidateDictionariesByPackage(String packageId) { if (Strings.isNullOrEmpty(packageId)) { logger.warn("Cannot invalidate dictionaries: packageId is null or empty"); return 0; } String prefix = packageId + CACHE_KEY_SEPARATOR; // Match keys like "pkg-1234:en_US" int sizeBefore = dictionaries.size(); dictionaries.keySet().removeIf(key -> key.startsWith(prefix)); int count = sizeBefore - dictionaries.size(); if (count > 0) { logger.info("Invalidated {} hunspell dictionary cache entries for package: {}", count, packageId); } else { logger.debug("No cached dictionaries found for package: {}", packageId); } return count; } Path-based Routing Risk The `buildInvalidateRequest` method uses `request.path().endsWith("/_invalidate_all")` to distinguish between the two POST routes. This is fragile — if the path matching logic changes or a trailing slash is added, the condition could silently fall through to the parameter-based branch, resulting in a request with no parameters set (which would then fail validation). It would be safer to use a dedicated route parameter or check the matched route explicitly. if (request.path().endsWith("/_invalidate_all")) { invalidateRequest.setInvalidateAll(true); } else { String packageId = request.param("package_id"); String locale = request.param("locale"); String cacheKey = request.param("cache_key"); invalidateRequest.setPackageId(packageId); invalidateRequest.setLocale(locale); invalidateRequest.setCacheKey(cacheKey); }

github-actions · 2026-03-04T14:07:06Z

PR Code Suggestions ✨

Latest suggestions up to 20ba2c0

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Avoid using computeIfAbsent with exception-throwing lambdas `computeIfAbsent` does not support checked exceptions, but if `loadDictionaryFromPackage` throws a runtime exception, the entry is NOT inserted into the map, yet `ConcurrentHashMap.computeIfAbsent` may leave the map in an inconsistent state on some JVM versions. More critically, if the lambda throws, the exception propagates but the key may be partially locked. Consider catching the exception outside `computeIfAbsent` by first checking the cache, then loading, then putting — to avoid this subtle concurrency issue. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [180-191] -return dictionaries.computeIfAbsent(cacheKey, (key) -> { - try { - return loadDictionaryFromPackage(packageId, locale); - } catch (Exception e) { +Dictionary cached = dictionaries.get(cacheKey); +if (cached != null) { + return cached; +} +try { + Dictionary loaded = loadDictionaryFromPackage(packageId, locale); + Dictionary existing = dictionaries.putIfAbsent(cacheKey, loaded); + return existing != null ? existing : loaded; +} catch (Exception e) { + throw new IllegalStateException( + String.format(Locale.ROOT, "Failed to load hunspell dictionary for package [%s] locale [%s]", packageId, locale), + e + ); +} - throw new IllegalStateException( - String.format(Locale.ROOT, "Failed to load hunspell dictionary for package [%s] locale [%s]", packageId, locale), - e - ); - } -}); - Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies a subtle concurrency issue with `computeIfAbsent` when the lambda throws an exception. In `ConcurrentHashMap`, a throwing lambda can leave the map in an inconsistent state on some JVM versions. The proposed fix using `get`/`putIfAbsent` is a valid alternative, though the practical impact may be limited in most JVM versions.	Low
Possible issue	Guard against null HunspellService causing injection failure When `hunspellService` is `null`, `TransportHunspellCacheInfoAction` and `TransportHunspellCacheInvalidateAction` will fail at injection time with a cryptic Guice error because `HunspellService` is not bound. Either always bind (even a no-op instance), or guard the action registration with the same null check to prevent runtime injection failures. server/src/main/java/org/opensearch/action/ActionModule.java [1128-1131] // Bind HunspellService for TransportHunspellCacheInvalidateAction injection if (hunspellService != null) { bind(HunspellService.class).toInstance(hunspellService); +} else { + // Prevent Guice injection failure when HunspellService is unavailable + throw new IllegalStateException("HunspellService must not be null when hunspell cache actions are registered"); } Suggestion importance[1-10]: 5 __ Why: The suggestion raises a valid concern that when `hunspellService` is null, Guice injection for `TransportHunspellCacheInfoAction` and `TransportHunspellCacheInvalidateAction` will fail. However, the proposed fix of throwing an `IllegalStateException` unconditionally may be too aggressive — the existing null check pattern suggests null is an intentional supported state (e.g., in tests). A better approach might be to conditionally skip registering the actions, but the suggestion does highlight a real issue.	Low
General	Count removed entries atomically inside removeIf Using `sizeBefore - dictionaries.size()` to count removed entries is not accurate in a concurrent environment because other threads may add or remove entries between the two `size()` calls. Instead, count the actual removals directly inside `removeIf` using an `AtomicInteger`. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [509-511] -int sizeBefore = dictionaries.size(); -dictionaries.keySet().removeIf(key -> key.startsWith(prefix)); -int count = sizeBefore - dictionaries.size(); +java.util.concurrent.atomic.AtomicInteger count = new java.util.concurrent.atomic.AtomicInteger(0); +dictionaries.keySet().removeIf(key -> { + if (key.startsWith(prefix)) { + count.incrementAndGet(); + return true; + } + return false; +}); +int removedCount = count.get(); Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that `sizeBefore - dictionaries.size()` is not accurate in concurrent scenarios. Using an `AtomicInteger` inside `removeIf` provides a more accurate count of actually removed entries, which is important for correct logging and return values.	Low
General	Fix incorrect path traversal detection logic The check `value.startsWith(".")` already covers `value.equals(".")` and `value.equals("..")`, and `value.contains("./")` / `value.contains("../")` are subsumed by the path separator check below. However, the check `value.endsWith(".")` would incorrectly reject valid locale identifiers that happen to end with a dot (unlikely but possible), and more importantly, the check misses Windows-style traversal like `foo..` — the `value.contains("\.")` check uses a regex-style escape but this is plain string comparison, so `\.` matches the literal two characters `.` not `</code> followed by .. This means foo.. would not be caught by value.contains("\.") but would be caught by the path separator check. The logic is confusing and partially redundant; simplify to rely on the path separator rejection and a single .. segment check.` server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java [160-168] -if (value.equals(".") - \|\| value.equals("..") - \|\| value.contains("./") - \|\| value.contains("../") - \|\| value.contains("\\.") - \|\| value.contains("\\..") - \|\| value.startsWith(".") - \|\| value.endsWith(".")) { +// Reject path traversal attempts: any segment that is "." or ".." +if (value.equals(".") \|\| value.equals("..") \|\| value.contains("/..") \|\| value.contains("../") \|\| value.startsWith("..")) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Invalid %s: [%s]. Path traversal sequences (., ..) are not allowed.", paramName, value) + ); +} Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that value.contains("\\\\.") is a plain string comparison matching literal \\. (two chars) rather than \ followed by ., making the Windows path traversal check ineffective. However, the improved code removes some valid checks (like value.endsWith(".")) and the path separator check below already handles backslashes, so the fix is partially correct but incomplete.	Low

Previous suggestions

Suggestions up to commit 1e9fbd5

Category	Suggestion	Impact
Possible issue	Avoid premature closure of directory used by Dictionary The `Dictionary` is created inside a try-with-resources block that closes the `NIOFSDirectory` immediately after construction. If `Dictionary` holds a reference to the directory for deferred I/O (e.g., lazy loading), closing the directory here will cause failures at use time. The directory should remain open for the lifetime of the `Dictionary`, or it must be verified that `Dictionary` copies all data eagerly before the directory is closed. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [324-326] -try (Directory tmp = new NIOFSDirectory(env.tmpDir())) { +// Note: NIOFSDirectory must remain open as long as the Dictionary may access it. +// If Dictionary performs all reads eagerly in the constructor, closing here is safe. +// Verify Lucene Dictionary behavior before closing the directory. +Directory tmp = new NIOFSDirectory(env.tmpDir()); +try { return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase); +} catch (Exception e) { + IOUtils.closeWhileHandlingException(tmp); + throw e; } +// tmp is intentionally not closed here; caller/GC handles it if Dictionary owns it. Suggestion importance[1-10]: 7 __ Why: Closing the `NIOFSDirectory` immediately after `Dictionary` construction could cause failures if `Dictionary` performs deferred I/O. This is a valid concern that could lead to hard-to-diagnose runtime errors, though the existing `getDictionary` method in the same class uses the same pattern, suggesting it may be intentional or safe with Lucene's implementation.	Medium
Possible issue	Prevent silent null binding causing runtime injection failure If `hunspellService` is null, the `TransportHunspellCacheInvalidateAction` and `TransportHunspellCacheInfoAction` will fail at injection time with a cryptic error. Either always bind the service (throwing if null) or skip registering the actions when the service is unavailable. The current silent skip of the binding will cause a runtime injection failure rather than a clear startup error. server/src/main/java/org/opensearch/action/ActionModule.java [1129-1131] if (hunspellService != null) { bind(HunspellService.class).toInstance(hunspellService); +} else { + throw new IllegalStateException("HunspellService is required for HunspellCacheInvalidateAction and HunspellCacheInfoAction"); } Suggestion importance[1-10]: 6 __ Why: The null check silently skips binding `HunspellService`, which would cause a cryptic injection failure at runtime when `TransportHunspellCacheInvalidateAction` or `TransportHunspellCacheInfoAction` are invoked. However, the test code explicitly passes `null` for `hunspellService`, suggesting this null-tolerance may be intentional for test scenarios, making a hard throw potentially too aggressive.	Low
General	Fix inaccurate concurrent removal count `ConcurrentHashMap.removeIf` is not atomic with respect to `size()`, so the computed `count` may be inaccurate in concurrent scenarios. More critically, entries added between `size()` and `removeIf` could cause a negative count. Use an `AtomicInteger` counter inside the `removeIf` predicate to accurately count removed entries. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [509-511] -int sizeBefore = dictionaries.size(); -dictionaries.keySet().removeIf(key -> key.startsWith(prefix)); -int count = sizeBefore - dictionaries.size(); +java.util.concurrent.atomic.AtomicInteger count = new java.util.concurrent.atomic.AtomicInteger(0); +dictionaries.keySet().removeIf(key -> { + if (key.startsWith(prefix)) { + count.incrementAndGet(); + return true; + } + return false; +}); +int removedCount = count.get(); +if (removedCount > 0) { + logger.info("Invalidated {} hunspell dictionary cache entries for package: {}", removedCount, packageId); +} else { + logger.debug("No cached dictionaries found for package: {}", packageId); +} +return removedCount; Suggestion importance[1-10]: 5 __ Why: The `sizeBefore - dictionaries.size()` approach can yield inaccurate counts in concurrent scenarios, and the code itself acknowledges this is only for logging. Using an `AtomicInteger` inside `removeIf` is more accurate, though the impact is limited to diagnostic logging accuracy.	Low
Security	Fix incomplete path traversal validation logic The path traversal check is incomplete: it misses sequences like `foo/..` (without trailing slash) and Windows-style `foo..`. Since path separators are already rejected in a subsequent check, the traversal check is partially redundant, but the ordering means a value like `..foo` (starts with `..`) passes the `startsWith(".")` check but not the `equals("..")` check. Rely on the path separator rejection first, or use `Path.normalize()` comparison to detect traversal more robustly. server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java [160-167] -if (value.equals(".") - \|\| value.equals("..") - \|\| value.contains("./") - \|\| value.contains("../") - \|\| value.contains("\\.") - \|\| value.contains("\\..") - \|\| value.startsWith(".") - \|\| value.endsWith(".")) { +// Reject null bytes (security) +if (value.contains("\0")) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Invalid %s: [%s]. Null bytes are not allowed.", paramName, value) + ); +} +// Reject any path separators (Unix and Windows) - must be checked before traversal detection +if (value.contains("/") \|\| value.contains("\\")) { + throw new IllegalArgumentException( + String.format( + Locale.ROOT, + "Invalid %s: [%s]. Path separators (/, \\) are not allowed. " + + "Use ref_path for package ID and locale for dictionary locale.", + paramName, + value + ) + ); +} + +// Reject path traversal attempts (no separators at this point, so only bare dots matter) +if (value.equals(".") \|\| value.equals("..")) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Invalid %s: [%s]. Path traversal sequences (., ..) are not allowed.", paramName, value) + ); +} + +// Reject cache key separator to prevent cache key injection +if (value.contains(":")) { + throw new IllegalArgumentException( + String.format( + Locale.ROOT, + "Invalid %s: [%s]. Colon (:) is not allowed as it is used as cache key separator.", + paramName, + value + ) + ); +} + Suggestion importance[1-10]: 5 __ Why: The path traversal check has some redundancy and edge cases (e.g., `..foo` passes `startsWith(".")` but the check is `startsWith(".")` which would catch it). Since path separators are rejected in a subsequent check, the traversal detection is partially redundant, but the suggestion to simplify by checking separators first is a reasonable improvement for clarity and correctness.	Low

Suggestions up to commit 87eab2c

Category	Suggestion	Impact
Possible issue	Omit null fields from XContent response output The `toXContent` method unconditionally writes `null` values for `package_id`, `locale`, and `cache_key` fields, which contradicts the test `testResponseToXContentOmitsNullValues` that expects these fields to be absent when null. Null fields should be conditionally omitted to produce clean JSON output. server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateResponse.java [83-85] -builder.field("package_id", packageId); -builder.field("locale", locale); -builder.field("cache_key", cacheKey); +if (packageId != null) builder.field("package_id", packageId); +if (locale != null) builder.field("locale", locale); +if (cacheKey != null) builder.field("cache_key", cacheKey); Suggestion importance[1-10]: 9 __ Why: The `toXContent` method unconditionally writes null values for `package_id`, `locale`, and `cache_key`, which directly contradicts the test `testResponseToXContentOmitsNullValues` that asserts these fields should be absent when null. This is a real bug that will cause test failures.	High
	Fix constructor signature mismatch with test usage The `RestHunspellCacheInvalidateAction` constructor takes no arguments, but the test file `RestHunspellCacheInvalidateActionTests` instantiates it with a `HunspellService` argument (`new RestHunspellCacheInvalidateAction(mockService)`). This mismatch will cause a compilation error in the tests. The constructor should accept a `HunspellService` parameter to match the test expectations, even if it's not used directly (since the service is accessed via the transport layer). server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java [56-57] -public RestHunspellCacheInvalidateAction() { +public RestHunspellCacheInvalidateAction(HunspellService hunspellService) { } Suggestion importance[1-10]: 8 __ Why: The test `RestHunspellCacheInvalidateActionTests` instantiates `RestHunspellCacheInvalidateAction` with a `HunspellService` argument, but the actual constructor takes no arguments. This mismatch will cause a compilation error in the tests.	Medium
	Pass HunspellService to REST handler constructor `RestHunspellCacheInvalidateAction` is instantiated with no arguments in `ActionModule`, but the test expects a constructor that takes a `HunspellService`. Additionally, `ActionModule` now holds a `hunspellService` field that should be passed to the REST handler if the constructor is updated. This ensures the handler is properly initialized and consistent with the test expectations. server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/ActionModule.java [1091] -registerHandler.accept(new RestHunspellCacheInvalidateAction()); +registerHandler.accept(new RestHunspellCacheInvalidateAction(hunspellService)); Suggestion importance[1-10]: 7 __ Why: The file referenced in the suggestion (`ActionModule.java`) is incorrect (it's actually in `server/src/main/java/org/opensearch/action/ActionModule.java`), but the actual code change at line 1091 in `ActionModule.java` does instantiate `RestHunspellCacheInvalidateAction()` without arguments. If the constructor is updated to accept `HunspellService`, this call site must also be updated to pass `hunspellService`.	Medium
General	Ensure affix stream opened before dictionary streams If an exception is thrown while opening one of the `dicStreams` (e.g., the second file), previously opened streams in `dicStreams` are not closed until the `finally` block. However, if `affixStream` is never assigned (exception before line `affixStream` `= Files.newInputStream(...)`), `IOUtils.close(affixStream)` is called with `null`, which is safe. The real risk is that `IOUtils.close(dicStreams)` in the `finally` block correctly closes all opened streams. This is safe, but consider opening `affixStream` before the loop to make the resource management clearer and consistent with the existing `loadDictionary` pattern. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [305-324] -InputStream affixStream = null; +InputStream affixStream = Files.newInputStream(affixFiles[0]); List<InputStream> dicStreams = new ArrayList<>(dicFiles.length); try { for (Path dicFile : dicFiles) { dicStreams.add(Files.newInputStream(dicFile)); } - affixStream = Files.newInputStream(affixFiles[0]); try (Directory tmp = new NIOFSDirectory(env.tmpDir())) { return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase); } } catch (Exception e) { logger.error(() -> new ParameterizedMessage( "Could not load hunspell dictionary from package [{}] locale [{}]", packageId, locale), e); throw e; } finally { IOUtils.close(affixStream); IOUtils.close(dicStreams); } Suggestion importance[1-10]: 3 __ Why: The suggestion proposes reordering stream initialization for clarity, but the current code is functionally correct since `IOUtils.close` handles null safely and closes all opened streams in the `finally` block. This is a minor style/consistency improvement with low impact.	Low

Suggestions up to commit 3926f7a

Category	Suggestion	Impact
Possible issue	Omit null fields from XContent serialization The `toXContent` method always writes `package_id`, `locale`, and `cache_key` fields even when they are null, which will serialize them as JSON null values. The test `testResponseToXContentOmitsNullValues` explicitly expects these fields to be absent when null. Add null checks before writing these fields. server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateResponse.java [83-85] -builder.field("package_id", packageId); -builder.field("locale", locale); -builder.field("cache_key", cacheKey); +if (packageId != null) builder.field("package_id", packageId); +if (locale != null) builder.field("locale", locale); +if (cacheKey != null) builder.field("cache_key", cacheKey); Suggestion importance[1-10]: 9 __ Why: The `toXContent` method always writes `package_id`, `locale`, and `cache_key` even when null, but the test `testResponseToXContentOmitsNullValues` explicitly asserts these fields are absent when null. This is a real bug that will cause test failures.	High
	Fix constructor signature mismatch with tests The `RestHunspellCacheInvalidateAction` constructor takes no arguments, but the test file instantiates it with a `HunspellService` parameter (`new` `RestHunspellCacheInvalidateAction(mockService)`). This mismatch will cause a compilation error. The constructor should accept a `HunspellService` parameter to match the test usage, or the tests are testing a different version of the class. server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java [56-57] -public RestHunspellCacheInvalidateAction() { +public RestHunspellCacheInvalidateAction(HunspellService hunspellService) { } Suggestion importance[1-10]: 8 __ Why: The test file `RestHunspellCacheInvalidateActionTests` instantiates `RestHunspellCacheInvalidateAction` with a `HunspellService` argument, but the actual constructor takes no arguments. This mismatch will cause a compilation error.	Medium
	Pass HunspellService to REST handler constructor `RestHunspellCacheInvalidateAction` is instantiated without arguments here, but the test expects a `HunspellService` constructor parameter. Additionally, the `hunspellService` field is available in `ActionModule` and should be passed to the handler so it can be used for direct local operations if needed. server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/ActionModule.java [1091] // Hunspell cache management API -registerHandler.accept(new RestHunspellCacheInvalidateAction()); +registerHandler.accept(new RestHunspellCacheInvalidateAction(hunspellService)); Suggestion importance[1-10]: 7 __ Why: This is directly related to suggestion 2 — if the constructor is fixed to accept `HunspellService`, the registration in `ActionModule` must also pass `hunspellService`. The `hunspellService` field is available in `ActionModule` and should be passed here.	Medium
General	Fix stream opening order for proper resource cleanup The `affixStream` is closed in the `finally` block after being passed to the `Dictionary` constructor, but the `Dictionary` constructor may read from the stream during construction. If the stream is closed before the `Dictionary` finishes using it (or if the `Directory` is closed before the `Dictionary` is fully initialized), it could cause issues. More critically, if an exception occurs while opening one of the `dicStreams`, the already-opened streams won't be closed until the `finally` block, which is correct, but the `affixStream` is opened after all `dicStreams`, so if opening `affixStream` fails, `dicStreams` are still properly closed in `finally`. The logic is mostly correct but the `affixStream` should be opened before the loop or the order should be consistent. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [305-324] InputStream affixStream = null; List<InputStream> dicStreams = new ArrayList<>(dicFiles.length); try { + affixStream = Files.newInputStream(affixFiles[0]); for (Path dicFile : dicFiles) { dicStreams.add(Files.newInputStream(dicFile)); } - affixStream = Files.newInputStream(affixFiles[0]); try (Directory tmp = new NIOFSDirectory(env.tmpDir())) { return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase); } } catch (Exception e) { logger.error(() -> new ParameterizedMessage( "Could not load hunspell dictionary from package [{}] locale [{}]", packageId, locale), e); throw e; } finally { IOUtils.close(affixStream); IOUtils.close(dicStreams); } Suggestion importance[1-10]: 3 __ Why: The suggested reordering (opening `affixStream` before `dicStreams`) is a minor style/consistency improvement. The existing code is functionally correct since all streams are closed in the `finally` block regardless of order, so this is a low-impact change.	Low

Suggestions up to commit e84b5bc

Category	Suggestion	Impact
Possible issue	Align conditional registration of transport actions and REST handler The REST handler for Hunspell cache management is only registered when `hunspellService` is non-null, but the transport actions `HunspellCacheInfoAction` and `HunspellCacheInvalidateAction` are always registered unconditionally. If `hunspellService` is null, the transport actions will be registered but will fail at injection time since `HunspellService` won't be bound in Guice. Either always register the handler (and let the transport action handle a null service gracefully), or guard the transport action registration with the same null check. server/src/main/java/org/opensearch/action/ActionModule.java [1091-1093] -// Hunspell cache management API +actions.register(HunspellCacheInfoAction.INSTANCE, TransportHunspellCacheInfoAction.class); +actions.register(HunspellCacheInvalidateAction.INSTANCE, TransportHunspellCacheInvalidateAction.class); +// ... and in configure(): if (hunspellService != null) { - registerHandler.accept(new RestHunspellCacheInvalidateAction(hunspellService)); + bind(HunspellService.class).toInstance(hunspellService); + // register REST handler only when service is available } +// Consider making HunspellService always non-null instead Suggestion importance[1-10]: 7 __ Why: This is a real inconsistency: transport actions `HunspellCacheInfoAction` and `HunspellCacheInvalidateAction` are always registered unconditionally, but `HunspellService` is only bound in Guice when non-null, which would cause injection failures at runtime. This mismatch could lead to hard-to-diagnose errors.	Medium
	Fix potential stream resource leak on failure The `affixStream` is closed in the `finally` block, but it has already been consumed and potentially closed by the `Dictionary` constructor. More critically, if `Files.newInputStream(affixFiles[0])` throws after some `dicStreams` have been opened, those streams won't be closed since `affixStream` is still null and `IOUtils.close(affixStream)` is a no-op. The `dicStreams` will be closed by `IOUtils.close(dicStreams)`, but the affix stream opened inside the try block may leak if the Dictionary constructor fails after consuming it. Ensure all streams are properly managed, ideally using try-with-resources for each stream. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [305-324] -InputStream affixStream = null; List<InputStream> dicStreams = new ArrayList<>(dicFiles.length); - try { for (Path dicFile : dicFiles) { dicStreams.add(Files.newInputStream(dicFile)); } - affixStream = Files.newInputStream(affixFiles[0]); - - try (Directory tmp = new NIOFSDirectory(env.tmpDir())) { - return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase); + try (InputStream affixStream = Files.newInputStream(affixFiles[0])) { + try (Directory tmp = new NIOFSDirectory(env.tmpDir())) { + return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase); + } } } catch (Exception e) { logger.error(() -> new ParameterizedMessage( "Could not load hunspell dictionary from package [{}] locale [{}]", packageId, locale), e); throw e; } finally { - IOUtils.close(affixStream); IOUtils.close(dicStreams); } Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that `affixStream` could leak if the `Dictionary` constructor fails, and using try-with-resources for `affixStream` is cleaner. However, the `Dictionary` constructor typically consumes and closes streams internally, and `IOUtils.close` in the finally block handles the null case, so the actual risk is moderate.	Low
	Avoid computeIfAbsent with exception-throwing function `ConcurrentHashMap.computeIfAbsent` does not allow the mapping function to throw checked exceptions, but if the function throws an unchecked exception (like `IllegalStateException`), the key will NOT be inserted into the map. However, a subtle issue exists: if the load fails with an `IllegalStateException`, subsequent calls will retry loading (which is correct), but the wrapping loses the original exception type. More critically, `computeIfAbsent` can deadlock if called recursively on the same key. Consider using a separate `get`-then-`put` pattern or documenting this limitation clearly. server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [180-191] -return dictionaries.computeIfAbsent(cacheKey, (key) -> { - try { - return loadDictionaryFromPackage(packageId, locale); - } catch (Exception e) { +Dictionary existing = dictionaries.get(cacheKey); +if (existing != null) { + return existing; +} +Dictionary loaded; +try { + loaded = loadDictionaryFromPackage(packageId, locale); +} catch (Exception e) { + throw new IllegalStateException( + String.format(Locale.ROOT, + "Failed to load hunspell dictionary for package [%s] locale [%s]", + packageId, locale), e); +} +Dictionary previous = dictionaries.putIfAbsent(cacheKey, loaded); +return previous != null ? previous : loaded; - throw new IllegalStateException( - String.format(Locale.ROOT, - "Failed to load hunspell dictionary for package [%s] locale [%s]", - packageId, locale), e); - } -}); - Suggestion importance[1-10]: 5 __ Why: The suggestion raises a valid concern about `computeIfAbsent` with exception-throwing lambdas and potential recursive deadlock, but the existing code already wraps checked exceptions in `IllegalStateException` (unchecked), so the checked exception issue doesn't apply. The `putIfAbsent` pattern is a reasonable alternative but the improvement is moderate.	Low
General	Remove side effects from validation method The `validate()` method has a side effect of mutating the object's fields (`packageId`, `locale`, `cacheKey`) by trimming them. This is unexpected behavior for a validation method and can cause issues if `validate()` is called multiple times or if the object is inspected before and after validation. Trimming should be done in the setters or in the transport action, not in `validate()`. server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateRequest.java [86-89] -// Update internal state with trimmed values -this.packageId = trimmedPackageId; -this.locale = trimmedLocale; -this.cacheKey = trimmedCacheKey; +// Do not mutate fields in validate(); trimming should be done in setters or transport layer. +// Remove the three lines that reassign this.packageId, this.locale, this.cacheKey Suggestion importance[1-10]: 5 __ Why: Mutating object state inside `validate()` is indeed a design smell and can cause subtle bugs if `validate()` is called multiple times. The improved_code doesn't show concrete replacement code, making it less actionable, but the concern is valid.	Low

github-actions · 2026-03-04T14:22:15Z

❌ Gradle check result for e84b5bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

.../java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateRequest.java

...java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateResponse.java

...rc/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java

...n/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateAction.java

- Add ref_path parameter for package-based dictionary loading - Load from config/packages/{packageId}/hunspell/{locale}/ - Add cache invalidation REST API (GET/POST /_hunspell/cache/_invalidate) - Add TransportAction with cluster:admin permission - Add comprehensive security validation (path traversal, separators, cache-key injection) - Add updateable flag for hot-reload via _reload_search_analyzers - Add comprehensive test coverage PR feedback addressed: - Stricter validate() to reject conflicting params - Path traversal checks now use config/packages/ as base - ref_path/locale validation rejects ., .., /, \, : characters Signed-off-by: shayush622 <ayush5267@gmail.com>

github-actions · 2026-03-04T16:29:59Z

Persistent review updated to latest commit 3926f7a

github-actions · 2026-03-04T16:43:56Z

❌ Gradle check result for 3926f7a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java

...n/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateAction.java

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java

github-actions · 2026-03-06T12:42:23Z

Persistent review updated to latest commit 87eab2c

github-actions · 2026-03-06T12:56:01Z

❌ Gradle check result for 87eab2c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: shayush622 <ayush5267@gmail.com>

github-actions · 2026-03-06T16:51:56Z

Persistent review updated to latest commit 1e9fbd5

- Add ref_path parameter for package-based dictionary loading - Load from config/packages/{packageId}/hunspell/{locale}/ - Add cache info API: GET /_hunspell/cache (cluster:monitor/hunspell/cache) - Add cache invalidation API: POST /_hunspell/cache/_invalidate (cluster:admin/hunspell/cache/invalidate) - Support invalidation by package_id, locale, cache_key, or invalidate_all - Add security validation (path traversal, separator injection, null bytes) - Add updateable flag for hot-reload via _reload_search_analyzers - Use Strings.hasText() and Strings.isNullOrEmpty() for validation consistency - Consistent response schema with all fields always present - Add unit tests, REST handler tests, and integration tests Signed-off-by: shayush622 <ayush5267@gmail.com>

github-actions · 2026-03-06T16:54:17Z

Persistent review updated to latest commit 20ba2c0

github-actions · 2026-03-06T17:59:06Z

❌ Gradle check result for 20ba2c0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

shayush622 · 2026-03-06T18:28:52Z

Superseded by #20792 — recreated with clean history

Copilot AI review requested due to automatic review settings February 27, 2026 11:18

shayush622 requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners February 27, 2026 11:18

Copilot started reviewing on behalf of shayush622 February 27, 2026 11:19 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

shayush622 force-pushed the main branch from 0e293c9 to 865aab5 Compare February 27, 2026 11:36

cwperks reviewed Mar 2, 2026

View reviewed changes

shayush622 requested a review from Copilot March 3, 2026 13:05

Copilot started reviewing on behalf of shayush622 March 3, 2026 13:06 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

cwperks added the skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. label Mar 4, 2026

cwperks reviewed Mar 4, 2026

View reviewed changes

.../java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateRequest.java Show resolved Hide resolved

cwperks reviewed Mar 4, 2026

View reviewed changes

.../java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateRequest.java Outdated Show resolved Hide resolved

cwperks reviewed Mar 4, 2026

View reviewed changes

...java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateResponse.java Outdated Show resolved Hide resolved

cwperks reviewed Mar 4, 2026

View reviewed changes

...rc/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java Outdated Show resolved Hide resolved

cwperks reviewed Mar 4, 2026

View reviewed changes

...n/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateAction.java Outdated Show resolved Hide resolved

shayush622 force-pushed the main branch from e84b5bc to 3926f7a Compare March 4, 2026 16:28

shayush622 requested a review from cwperks March 4, 2026 16:30

gaobinlong reviewed Mar 6, 2026

View reviewed changes

Merge branch 'opensearch-project:main' into main

c1dd00b

Signed-off-by: shayush622 <ayush5267@gmail.com>

shayush622 force-pushed the main branch from 87eab2c to 1e9fbd5 Compare March 6, 2026 16:50

shayush622 force-pushed the main branch from 1e9fbd5 to 20ba2c0 Compare March 6, 2026 16:52

shayush622 closed this Mar 6, 2026

github-actions bot mentioned this pull request Mar 6, 2026

Feat/hunspell ref path #20792

Closed

3 tasks

	// Add for hunspell invalidation cache testing:
	HunspellService hunspellService = analysisModule.getHunspellService();
	restController.registerHandler(new RestHunspellCacheInvalidateAction(hunspellService));

Conversation

shayush622 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Feb 27, 2026

PR Code Analyzer ❗

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

cwperks Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

shayush622 Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 2, 2026

PR Code Analyzer ❗

Uh oh!

cwperks Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

cwperks Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

shayush622 Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 20ba2c0)

Uh oh!

github-actions bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

shayush622 commented Feb 27, 2026 •

edited

Loading

github-actions bot commented Mar 4, 2026 •

edited

Loading

(Review updated until commit `20ba2c0`)

github-actions bot commented Mar 4, 2026 •

edited

Loading