Skip to content

feat(analysis): Add package-based Hunspell dictionary support with ref_path parameter and cache invalidation API#20741

Closed
shayush622 wants to merge 7 commits intoopensearch-project:mainfrom
shayush622:main
Closed

feat(analysis): Add package-based Hunspell dictionary support with ref_path parameter and cache invalidation API#20741
shayush622 wants to merge 7 commits intoopensearch-project:mainfrom
shayush622:main

Conversation

@shayush622
Copy link
Contributor

@shayush622 shayush622 commented Feb 27, 2026

  • Add INDEX_REF_PATH_SETTING for package-based hunspell dictionaries
  • Add RestHunspellCacheInvalidateAction for cache invalidation endpoint
  • Update HunspellService with cache management methods
  • Add ref_path validation in MetadataCreateIndexService

Description

This PR adds support for loading Hunspell dictionaries from package-based directories using a new ref_path parameter, enabling multi-tenant dictionary isolation and hot-reload capabilities.

Related Issues

Resolves #[20712]
Link to RFC.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 0e293c9.

PathLineSeverityDescription
server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java1highA new unauthenticated REST endpoint (/_hunspell/cache/_invalidate_all) is registered directly in Node.java with a comment explicitly stating it is 'for hunspell invalidation cache testing'. This endpoint exposes cache management operations (including full cache wipe) via HTTP without any authorization check or role-based access control, and bypasses the normal plugin/action registration framework. An attacker with network access to the node could repeatedly invalidate all dictionaries causing denial-of-service or resource exhaustion. The 'testing' comment strongly suggests this was not intended for production but was left in deliberately.
server/src/main/java/org/opensearch/node/Node.java1676mediumThe REST handler registration is done inline in Node bootstrap outside of the standard REST action plugin registration path, bypassing any framework-level security enforcement (e.g., transport-layer action filters, permission checks). The inline comment '// Add for hunspell invalidation cache testing:' signals this is debug/test infrastructure committed to production code, which is a suspicious pattern that could be a backdoor or intentional weakening of the security boundary around cache management.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 1 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for loading Hunspell dictionaries from package-based directories using a new ref_path parameter, enabling multi-tenant dictionary isolation and hot-reload capabilities through a new cache management REST API.

Changes:

  • Introduces package-based dictionary loading with ref_path parameter alongside traditional locale-based loading
  • Adds REST API endpoints for Hunspell cache management (view, invalidate by package/key, invalidate all)
  • Implements cache invalidation methods in HunspellService for hot-reload support
  • Adds index.ref_path index-level setting with validation (though usage is unclear)

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java Adds getDictionaryFromPackage(), cache invalidation methods, and utility methods for package-based dictionary management
server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java New REST handler for cache management with GET/POST endpoints
server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java Updates to support ref_path parameter and hot-reload via updateable flag
server/src/main/java/org/opensearch/index/IndexSettings.java Adds INDEX_REF_PATH_SETTING as dynamic index-level setting
server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java Adds validateRefPath() for index.ref_path validation
server/src/main/java/org/opensearch/node/Node.java Registers RestHunspellCacheInvalidateAction
server/src/main/java/org/opensearch/indices/analysis/AnalysisModule.java Makes getHunspellService() public and adds Environment parameter to HunspellTokenFilterFactory
server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java Registers INDEX_REF_PATH_SETTING
server/src/test/java/org/opensearch/indices/analyze/HunspellServiceTests.java Adds comprehensive tests for package-based dictionary loading and cache management
mise.toml Adds development environment configuration (unrelated to PR)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1752 to +1754
if (!refPath.startsWith("packages/")) {
validationErrors.add("ref_path [" + refPath + "] must start with 'packages/'");
return validationErrors;
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ref_path validation expects the value to start with "packages/" (e.g., "packages/pkg-1234"), but HunspellTokenFilterFactory at line 87 expects ref_path to be just the package ID (e.g., "pkg-1234") without the "packages/" prefix. This creates an inconsistency where the validation will reject valid usage patterns.

Looking at HunspellService.loadDictionaryFromPackage (lines 199-201), it constructs the path as config/packages/{packageId}, meaning it expects just the package ID, not the full path.

The validation should either:

  1. Accept just the package ID (e.g., "pkg-1234") to match the usage, or
  2. Update HunspellTokenFilterFactory and HunspellService to expect the full path with "packages/" prefix

Option 1 is recommended for consistency with the implementation and to avoid confusion for users.

Copilot uses AI. Check for mistakes.
Comment on lines +1676 to +1679
// Add for hunspell invalidation cache testing:
HunspellService hunspellService = analysisModule.getHunspellService();
restController.registerHandler(new RestHunspellCacheInvalidateAction(hunspellService));

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REST handlers should typically be registered in ActionModule's constructor using the standard pattern with registerHandler.accept(), not directly in Node.java. This breaks the established pattern where all REST handlers are registered in one central location in ActionModule (around lines 850-1077 in ActionModule.java).

Consider moving this registration to ActionModule.java to maintain consistency with other REST handlers in the codebase. The HunspellService can be passed as a parameter to RestHunspellCacheInvalidateAction or retrieved from the AnalysisModule if needed.

Suggested change
// Add for hunspell invalidation cache testing:
HunspellService hunspellService = analysisModule.getHunspellService();
restController.registerHandler(new RestHunspellCacheInvalidateAction(hunspellService));

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ Let's address this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, addressed

Comment on lines +199 to +231
Path packageDir = env.configDir()
.resolve("packages")
.resolve(packageId);

// Security check: ensure path is under config directory
if (!packageDir.normalize().startsWith(env.configDir().toAbsolutePath())) {
throw new IllegalArgumentException(
String.format(Locale.ROOT, "Package path must be under config directory. Package: [%s]", packageId)
);
}

// Check if package directory exists
if (!Files.isDirectory(packageDir)) {
throw new OpenSearchException(
String.format(Locale.ROOT,
"Package directory not found: [%s]. Expected at: %s",
packageId, packageDir)
);
}

// Auto-detect hunspell directory within package
Path hunspellDir = packageDir.resolve("hunspell");
if (!Files.isDirectory(hunspellDir)) {
throw new OpenSearchException(
String.format(Locale.ROOT,
"Hunspell directory not found in package [%s]. " +
"Expected 'hunspell' subdirectory at: %s",
packageId, hunspellDir)
);
}

// Resolve locale directory within hunspell
Path dicDir = hunspellDir.resolve(locale);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path traversal check at line 204 is not entirely correct. The check compares packageDir.normalize().startsWith(env.configDir().toAbsolutePath()), but packageDir is not converted to absolute path before normalization. This could allow path traversal attacks with inputs like "../../../etc" because:

  1. packageDir is relative (config/packages/../../../etc)
  2. After normalize(), it becomes a different relative path
  3. The startsWith check may fail to detect the traversal

The check should be: if (!packageDir.toAbsolutePath().normalize().startsWith(env.configDir().toAbsolutePath().normalize())) to ensure both paths are absolute and normalized before comparison.

Additionally, the locale parameter (line 231: dicDir = hunspellDir.resolve(locale)) is not validated for path traversal. A malicious locale value like "../../../etc/passwd" could escape the hunspell directory. Add similar validation for the locale parameter.

Suggested change
Path packageDir = env.configDir()
.resolve("packages")
.resolve(packageId);
// Security check: ensure path is under config directory
if (!packageDir.normalize().startsWith(env.configDir().toAbsolutePath())) {
throw new IllegalArgumentException(
String.format(Locale.ROOT, "Package path must be under config directory. Package: [%s]", packageId)
);
}
// Check if package directory exists
if (!Files.isDirectory(packageDir)) {
throw new OpenSearchException(
String.format(Locale.ROOT,
"Package directory not found: [%s]. Expected at: %s",
packageId, packageDir)
);
}
// Auto-detect hunspell directory within package
Path hunspellDir = packageDir.resolve("hunspell");
if (!Files.isDirectory(hunspellDir)) {
throw new OpenSearchException(
String.format(Locale.ROOT,
"Hunspell directory not found in package [%s]. " +
"Expected 'hunspell' subdirectory at: %s",
packageId, hunspellDir)
);
}
// Resolve locale directory within hunspell
Path dicDir = hunspellDir.resolve(locale);
Path configDir = env.configDir().toAbsolutePath().normalize();
Path packageDir = configDir
.resolve("packages")
.resolve(packageId);
Path packageDirAbs = packageDir.toAbsolutePath().normalize();
// Security check: ensure path is under config directory
if (!packageDirAbs.startsWith(configDir)) {
throw new IllegalArgumentException(
String.format(Locale.ROOT, "Package path must be under config directory. Package: [%s]", packageId)
);
}
// Check if package directory exists
if (!Files.isDirectory(packageDirAbs)) {
throw new OpenSearchException(
String.format(Locale.ROOT,
"Package directory not found: [%s]. Expected at: %s",
packageId, packageDirAbs)
);
}
// Auto-detect hunspell directory within package
Path hunspellDir = packageDirAbs.resolve("hunspell");
Path hunspellDirAbs = hunspellDir.toAbsolutePath().normalize();
if (!Files.isDirectory(hunspellDirAbs)) {
throw new OpenSearchException(
String.format(Locale.ROOT,
"Hunspell directory not found in package [%s]. " +
"Expected 'hunspell' subdirectory at: %s",
packageId, hunspellDirAbs)
);
}
// Resolve locale directory within hunspell and validate against traversal
Path dicDir = hunspellDirAbs.resolve(locale);
Path dicDirAbs = dicDir.toAbsolutePath().normalize();
if (!dicDirAbs.startsWith(hunspellDirAbs)) {
throw new IllegalArgumentException(
String.format(Locale.ROOT, "Locale path must be under hunspell directory. Package: [%s], locale: [%s]", packageId, locale)
);
}
dicDir = dicDirAbs;

Copilot uses AI. Check for mistakes.
Comment on lines +494 to +500
*
* @return count of invalidated cache entries
*/
public int invalidateAllDictionaries() {
int count = dictionaries.size();
dictionaries.clear();
logger.info("Invalidated all {} cached hunspell dictionaries", count);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The invalidateAllDictionaries method has a race condition. Line 498 reads the size, then line 499 clears the map. Between these two operations, other threads could add or remove entries, making the returned count inaccurate. While this is a minor issue for a diagnostic return value, it could lead to confusion in logs.

Consider using int count = dictionaries.size(); dictionaries.clear(); return count; in a single atomic operation or accepting that the count may be approximate and documenting this behavior.

Suggested change
*
* @return count of invalidated cache entries
*/
public int invalidateAllDictionaries() {
int count = dictionaries.size();
dictionaries.clear();
logger.info("Invalidated all {} cached hunspell dictionaries", count);
* <p>
* Note: The returned count is based on the cache size observed just before clearing and may be
* approximate if other threads are concurrently adding or removing entries.
*
* @return approximate count of invalidated cache entries
*/
public int invalidateAllDictionaries() {
int count = dictionaries.size();
dictionaries.clear();
logger.info(
"Invalidated all cached hunspell dictionaries; previous observed cache size was {} (may be approximate due to concurrent updates)",
count
);

Copilot uses AI. Check for mistakes.

/**
* The token filter factory for the hunspell analyzer
* *
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a double asterisk "* *" on line 46 which appears to be a formatting error in the Javadoc comment. Remove the extra asterisk to maintain proper documentation formatting.

Suggested change
* *
*

Copilot uses AI. Check for mistakes.
}

// Resolve locale directory within hunspell
Path dicDir = hunspellDir.resolve(locale);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The locale parameter is resolved to a directory path (line 231) without validation for path traversal. A malicious locale value like "../../../etc/passwd" or "../../.." could escape the hunspell directory and access arbitrary files on the system.

Add validation after line 231 to ensure the resolved dicDir is still under the hunspellDir:

if (!dicDir.toAbsolutePath().normalize().startsWith(hunspellDir.toAbsolutePath().normalize())) {
    throw new IllegalArgumentException(
        String.format(Locale.ROOT, "Locale path must be under hunspell directory. Locale: [%s]", locale)
    );
}

This prevents directory traversal attacks via the locale parameter.

Suggested change
Path dicDir = hunspellDir.resolve(locale);
Path dicDir = hunspellDir.resolve(locale);
if (!dicDir.toAbsolutePath().normalize().startsWith(hunspellDir.toAbsolutePath().normalize())) {
throw new IllegalArgumentException(
String.format(Locale.ROOT, "Locale path must be under hunspell directory. Locale: [%s]", locale)
);
}

Copilot uses AI. Check for mistakes.
* @throws IllegalArgumentException if packageId or locale is null
* @throws OpenSearchException if hunspell directory not found or dictionary cannot be loaded
*/
public Dictionary getDictionaryFromPackage(String packageId, String locale, Environment env) {
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getDictionaryFromPackage method requires an Environment parameter, but HunspellService already has access to the environment from its constructor (used to resolve hunspellDir at line 125). This creates inconsistency:

  1. getDictionary() uses the environment from the constructor
  2. getDictionaryFromPackage() requires environment to be passed as a parameter

This inconsistency is error-prone because:

  • Callers might pass a different Environment instance than the one used during construction
  • It adds unnecessary complexity to the API

Consider removing the env parameter from getDictionaryFromPackage and using the environment from the constructor, similar to how getDictionary() works. Store the environment as a field during construction if needed.

Copilot uses AI. Check for mistakes.
- Add INDEX_REF_PATH_SETTING for package-based hunspell dictionaries
- Add RestHunspellCacheInvalidateAction for cache invalidation endpoint
- Update HunspellService with cache management methods
- Add ref_path validation in MetadataCreateIndexService

Signed-off-by: shayush622 <ayush5267@gmail.com>
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 865aab5.

PathLineSeverityDescription
server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java1highAn unauthenticated REST endpoint (/_hunspell/cache/_invalidate_all) is registered directly in Node.java outside the normal plugin/action registration framework with a comment explicitly labeling it 'for testing'. This endpoint allows any caller to evict all cached Hunspell dictionaries, enabling a denial-of-service by repeatedly flushing the cache and forcing expensive disk reloads. The registration bypasses the standard security and authorization checks applied to other admin endpoints, and the 'testing' comment suggests it was not intended for production but was left in.
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java183mediumThe path traversal guard in loadDictionaryFromPackage compares packageDir.normalize() against env.configDir().toAbsolutePath(), but packageDir is built from env.configDir() (already absolute) then normalized, while the check uses toAbsolutePath() on the right-hand side only. If configDir is a relative path at construction time, startsWith() could fail to catch a traversal. A crafted packageId containing '../' sequences that survive the normalize() call (e.g., on certain filesystems) could still escape the config directory. This is in a code path triggered by user-supplied index settings.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 1 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 865aab5.

PathLineSeverityDescription
server/src/main/java/org/opensearch/node/Node.java1676highDebug/testing REST endpoint registered in production with comment 'Add for hunspell invalidation cache testing'. Registered via restController.registerHandler() directly, bypassing the standard OpenSearch action/transport layer where authorization checks are enforced.
server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java68highNew unauthenticated REST endpoints (/_hunspell/cache/_invalidate_all, /_hunspell/cache/_invalidate) allow complete dictionary cache invalidation with no privilege verification. Implemented as a direct BaseRestHandler without a transport action, sidestepping OpenSearch standard authorization. Any HTTP caller can flush the entire cache or enumerate all cached package IDs and locales.
server/src/main/java/org/opensearch/cluster/metadata/MetadataCreateIndexService.java1756mediumPath traversal in validateRefPath: only checks that refPath starts with the string 'packages/' but never verifies the normalized resolved path stays within configDir. A value like 'packages/../../../etc/' passes the string check. After resolve+normalize it escapes configDir, and directory existence is leaked via validation error messages. HunspellService correctly uses startsWith(configDir.toAbsolutePath()) but this method lacks that control.
server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java110mediumGET /_hunspell/cache returns all cached dictionary keys in plaintext to any caller, exposing internal package IDs and locale configurations (e.g., 'pkg-1234:en_US'). Discloses internal system topology and package deployment details without any access control.
server/src/main/java/org/opensearch/indices/analysis/AnalysisModule.java119lowgetHunspellService() visibility changed from package-private to public without clear necessity beyond wiring the new test endpoint in Node.java. Unnecessarily expands the public API surface of HunspellService, enabling external manipulation of the service from any class in the codebase.

The table above displays the top 10 most important findings.

Total: 5 | Critical: 0 | High: 2 | Medium: 2 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.


List<String> getIndexSettingsValidationErrors(final Settings settings, final boolean forbidPrivateIndexSettings, String indexName) {
List<String> validationErrors = getIndexSettingsValidationErrors(settings, forbidPrivateIndexSettings, Optional.of(indexName));
validationErrors.addAll(validateRefPath(settings, env.configDir()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we move this within getIndexSettingsValidationErrors to keep all validations in a single place?

// Get both ref_path and locale parameters
String refPath = settings.get("ref_path"); // Package ID only (optional)
String locale = settings.get("locale", settings.get("language", settings.get("lang", null)));
if (locale == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only check this if ref_path is provided? Is this no longer always required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered that scenario below, re-written the conditions; first we are checking for ref_path (an additional parameter) if it is present -> then we go for checking locale is present or not.
In the else part if ref-path is not there, we fall back to the original check of locale

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 22 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +83 to +91
protected Set<String> responseParams() {
Set<String> params = new HashSet<>();
params.add("package_id");
params.add("cache_key");
params.add("locale");
return unmodifiableSet(params);
}

@Override
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responseParams() is meant for response-format parameters (e.g., filter_path, human), not for request routing params. Including package_id/locale/cache_key here can mask unconsumed parameters (e.g., /_hunspell/cache/_invalidate_all?package_id=... would silently ignore package_id instead of failing strict param checks). Drop this override (or only include real response params) so invalid/unconsumed request parameters are rejected.

Suggested change
protected Set<String> responseParams() {
Set<String> params = new HashSet<>();
params.add("package_id");
params.add("cache_key");
params.add("locale");
return unmodifiableSet(params);
}
@Override

Copilot uses AI. Check for mistakes.
Comment on lines +220 to +227
// Additional check: ensure the resolved package directory is exactly one level under packages/
// This prevents packageId=".." or "foo/../bar" from escaping
if (!packageDirAbsolute.getParent().equals(packagesBaseDirAbsolute)) {
throw new IllegalArgumentException(
String.format(Locale.ROOT,
"Invalid package ID: [%s]. Package ID cannot contain path traversal sequences.", packageId)
);
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadDictionaryFromPackage claims this check prevents packageId=".." or "foo/../bar" (line comment), but toAbsolutePath().normalize() will turn foo/../bar into bar, so the getParent().equals(...) check still passes and path segments are effectively accepted. If packageId is intended to be an ID (single path segment), add explicit validation rejecting path separators/traversal (and consider the same for locale) rather than relying on normalization + parent checks; also update the misleading comment.

Copilot uses AI. Check for mistakes.
@cwperks cwperks added the skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. label Mar 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

PR Reviewer Guide 🔍

(Review updated until commit 20ba2c0)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Package-based Hunspell dictionary loading with ref_path parameter

Relevant files:

  • server/src/main/java/org/opensearch/indices/analysis/HunspellService.java
  • server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java
  • server/src/test/java/org/opensearch/indices/analyze/HunspellServiceTests.java
  • server/src/test/java/org/opensearch/index/analysis/HunspellTokenFilterFactoryTests.java
  • server/src/test/resources/indices/analyze/conf_dir/packages/test-pkg/hunspell/en_US/en_US.aff

Sub-PR theme: Hunspell cache invalidation REST API and transport actions

Relevant files:

  • server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateAction.java
  • server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateRequest.java
  • server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateResponse.java
  • server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoAction.java
  • server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoRequest.java
  • server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoResponse.java
  • server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/TransportHunspellCacheInvalidateAction.java
  • server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/TransportHunspellCacheInfoAction.java
  • server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/package-info.java
  • server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java
  • server/src/main/java/org/opensearch/action/ActionModule.java
  • server/src/main/java/org/opensearch/node/Node.java
  • server/src/main/java/org/opensearch/indices/analysis/AnalysisModule.java
  • server/src/test/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateActionTests.java
  • server/src/internalClusterTest/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheIT.java
  • server/src/test/java/org/opensearch/action/ActionModuleTests.java
  • server/src/test/java/org/opensearch/extensions/rest/RestSendToExtensionActionTests.java

⚡ Recommended focus areas for review

Single-Node Scope

The TransportHunspellCacheInvalidateAction extends HandledTransportAction and only executes on the coordinating node. In a multi-node cluster, each node has its own HunspellService with its own in-memory cache. Invalidating the cache on one node does not propagate to other nodes. This should likely use a broadcast/nodes action (e.g., extending TransportNodesAction) to invalidate the cache across all nodes.

@Override
protected void doExecute(Task task, HunspellCacheInvalidateRequest request, ActionListener<HunspellCacheInvalidateResponse> listener) {
    try {
        String packageId = request.getPackageId();
        String locale = request.getLocale();
        String cacheKey = request.getCacheKey();
        boolean invalidateAll = request.isInvalidateAll();

        int invalidatedCount = 0;
        String responseCacheKey = null;

        if (invalidateAll) {
            // Invalidate all cached dictionaries
            invalidatedCount = hunspellService.invalidateAllDictionaries();
        } else if (packageId != null && locale != null) {
            // Invalidate specific package + locale combination
            responseCacheKey = HunspellService.buildPackageCacheKey(packageId, locale);
            boolean invalidated = hunspellService.invalidateDictionary(responseCacheKey);
            invalidatedCount = invalidated ? 1 : 0;
        } else if (packageId != null) {
            // Invalidate all locales for a package
            invalidatedCount = hunspellService.invalidateDictionariesByPackage(packageId);
        } else if (cacheKey != null) {
            // Invalidate a specific cache key directly
            responseCacheKey = cacheKey;
            boolean invalidated = hunspellService.invalidateDictionary(cacheKey);
            invalidatedCount = invalidated ? 1 : 0;
        }

        listener.onResponse(new HunspellCacheInvalidateResponse(true, invalidatedCount, packageId, locale, responseCacheKey));
    } catch (Exception e) {
        listener.onFailure(e);
    }
}
Single-Node Scope

Similar to the invalidate action, the cache info action only retrieves cache state from the coordinating node. In a multi-node cluster, each node has its own HunspellService cache. The response will not reflect the full cluster state. Consider using a nodes-level action to aggregate cache info across all nodes.

protected void doExecute(Task task, HunspellCacheInfoRequest request, ActionListener<HunspellCacheInfoResponse> listener) {
    try {
        Set<String> cachedKeys = hunspellService.getCachedDictionaryKeys();

        Set<String> packageKeys = new HashSet<>();
        Set<String> localeKeys = new HashSet<>();

        for (String key : cachedKeys) {
            if (HunspellService.isPackageCacheKey(key)) {
                packageKeys.add(key);
            } else {
                localeKeys.add(key);
            }
        }

        HunspellCacheInfoResponse response = new HunspellCacheInfoResponse(
            cachedKeys.size(),
            packageKeys.size(),
            localeKeys.size(),
            packageKeys,
            localeKeys
        );

        listener.onResponse(response);
    } catch (Exception e) {
        listener.onFailure(e);
    }
}
Null HunspellService

The HunspellService is conditionally bound only when non-null (guarded by if (hunspellService != null)). However, TransportHunspellCacheInvalidateAction and TransportHunspellCacheInfoAction both have @Inject constructors requiring HunspellService. If hunspellService is null, Guice injection will fail at runtime when these transport actions are invoked. The null guard should either be removed (always bind) or the transport actions should handle the null case gracefully.

if (hunspellService != null) {
    bind(HunspellService.class).toInstance(hunspellService);
}
Race Condition in invalidateDictionariesByPackage

The invalidateDictionariesByPackage method computes sizeBefore and then calls removeIf, but between these two operations in a concurrent environment, entries could be added or removed. The computed count (sizeBefore - dictionaries.size()) may be inaccurate. While the comment acknowledges this for invalidateAllDictionaries, the same caveat applies here but is not documented. More importantly, removeIf on a ConcurrentHashMap's keySet is not atomic, so concurrent loads could re-add entries during removal.

public int invalidateDictionariesByPackage(String packageId) {
    if (Strings.isNullOrEmpty(packageId)) {
        logger.warn("Cannot invalidate dictionaries: packageId is null or empty");
        return 0;
    }

    String prefix = packageId + CACHE_KEY_SEPARATOR;  // Match keys like "pkg-1234:en_US"
    int sizeBefore = dictionaries.size();
    dictionaries.keySet().removeIf(key -> key.startsWith(prefix));
    int count = sizeBefore - dictionaries.size();

    if (count > 0) {
        logger.info("Invalidated {} hunspell dictionary cache entries for package: {}", count, packageId);
    } else {
        logger.debug("No cached dictionaries found for package: {}", packageId);
    }
    return count;
}
Path-based Routing Risk

The buildInvalidateRequest method uses request.path().endsWith("/_invalidate_all") to distinguish between the two POST routes. This is fragile — if the path matching logic changes or a trailing slash is added, the condition could silently fall through to the parameter-based branch, resulting in a request with no parameters set (which would then fail validation). It would be safer to use a dedicated route parameter or check the matched route explicitly.

if (request.path().endsWith("/_invalidate_all")) {
    invalidateRequest.setInvalidateAll(true);
} else {
    String packageId = request.param("package_id");
    String locale = request.param("locale");
    String cacheKey = request.param("cache_key");

    invalidateRequest.setPackageId(packageId);
    invalidateRequest.setLocale(locale);
    invalidateRequest.setCacheKey(cacheKey);
}

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

PR Code Suggestions ✨

Latest suggestions up to 20ba2c0

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid using computeIfAbsent with exception-throwing lambdas

computeIfAbsent does not support checked exceptions, but if
loadDictionaryFromPackage throws a runtime exception, the entry is NOT inserted into
the map, yet ConcurrentHashMap.computeIfAbsent may leave the map in an inconsistent
state on some JVM versions. More critically, if the lambda throws, the exception
propagates but the key may be partially locked. Consider catching the exception
outside computeIfAbsent by first checking the cache, then loading, then putting — to
avoid this subtle concurrency issue.

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [180-191]

-return dictionaries.computeIfAbsent(cacheKey, (key) -> {
-    try {
-        return loadDictionaryFromPackage(packageId, locale);
-    } catch (Exception e) {
+Dictionary cached = dictionaries.get(cacheKey);
+if (cached != null) {
+    return cached;
+}
+try {
+    Dictionary loaded = loadDictionaryFromPackage(packageId, locale);
+    Dictionary existing = dictionaries.putIfAbsent(cacheKey, loaded);
+    return existing != null ? existing : loaded;
+} catch (Exception e) {
+    throw new IllegalStateException(
+        String.format(Locale.ROOT, "Failed to load hunspell dictionary for package [%s] locale [%s]", packageId, locale),
+        e
+    );
+}
 
-        throw new IllegalStateException(
-            String.format(Locale.ROOT, "Failed to load hunspell dictionary for package [%s] locale [%s]", packageId, locale),
-            e
-        );
-    }
-});
-
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a subtle concurrency issue with computeIfAbsent when the lambda throws an exception. In ConcurrentHashMap, a throwing lambda can leave the map in an inconsistent state on some JVM versions. The proposed fix using get/putIfAbsent is a valid alternative, though the practical impact may be limited in most JVM versions.

Low
Guard against null HunspellService causing injection failure

When hunspellService is null, TransportHunspellCacheInfoAction and
TransportHunspellCacheInvalidateAction will fail at injection time with a cryptic
Guice error because HunspellService is not bound. Either always bind (even a no-op
instance), or guard the action registration with the same null check to prevent
runtime injection failures.

server/src/main/java/org/opensearch/action/ActionModule.java [1128-1131]

 // Bind HunspellService for TransportHunspellCacheInvalidateAction injection
 if (hunspellService != null) {
     bind(HunspellService.class).toInstance(hunspellService);
+} else {
+    // Prevent Guice injection failure when HunspellService is unavailable
+    throw new IllegalStateException("HunspellService must not be null when hunspell cache actions are registered");
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern that when hunspellService is null, Guice injection for TransportHunspellCacheInfoAction and TransportHunspellCacheInvalidateAction will fail. However, the proposed fix of throwing an IllegalStateException unconditionally may be too aggressive — the existing null check pattern suggests null is an intentional supported state (e.g., in tests). A better approach might be to conditionally skip registering the actions, but the suggestion does highlight a real issue.

Low
General
Count removed entries atomically inside removeIf

Using sizeBefore - dictionaries.size() to count removed entries is not accurate in a
concurrent environment because other threads may add or remove entries between the
two size() calls. Instead, count the actual removals directly inside removeIf using
an AtomicInteger.

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [509-511]

-int sizeBefore = dictionaries.size();
-dictionaries.keySet().removeIf(key -> key.startsWith(prefix));
-int count = sizeBefore - dictionaries.size();
+java.util.concurrent.atomic.AtomicInteger count = new java.util.concurrent.atomic.AtomicInteger(0);
+dictionaries.keySet().removeIf(key -> {
+    if (key.startsWith(prefix)) {
+        count.incrementAndGet();
+        return true;
+    }
+    return false;
+});
+int removedCount = count.get();
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that sizeBefore - dictionaries.size() is not accurate in concurrent scenarios. Using an AtomicInteger inside removeIf provides a more accurate count of actually removed entries, which is important for correct logging and return values.

Low
Fix incorrect path traversal detection logic

The check value.startsWith(".") already covers value.equals(".") and
value.equals(".."), and value.contains("./") / value.contains("../") are subsumed by
the path separator check below. However, the check value.endsWith(".") would
incorrectly reject valid locale identifiers that happen to end with a dot (unlikely
but possible), and more importantly, the check misses Windows-style traversal like
foo.. — the value.contains("\.") check uses a regex-style escape but this is plain
string comparison, so \. matches the literal two characters . not </code> followed by ..
This means foo.. would not be caught by value.contains("\.") but would be caught
by the path separator check. The logic is confusing and partially redundant;
simplify to rely on the path separator rejection and a single .. segment check.

server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java [160-168]

-if (value.equals(".")
-    || value.equals("..")
-    || value.contains("./")
-    || value.contains("../")
-    || value.contains("\\.")
-    || value.contains("\\..")
-    || value.startsWith(".")
-    || value.endsWith(".")) {
+// Reject path traversal attempts: any segment that is "." or ".."
+if (value.equals(".") || value.equals("..") || value.contains("/..") || value.contains("../") || value.startsWith("..")) {
+    throw new IllegalArgumentException(
+        String.format(Locale.ROOT, "Invalid %s: [%s]. Path traversal sequences (., ..) are not allowed.", paramName, value)
+    );
+}
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that value.contains("\\\\.") is a plain string comparison matching literal \\. (two chars) rather than \ followed by ., making the Windows path traversal check ineffective. However, the improved code removes some valid checks (like value.endsWith(".")) and the path separator check below already handles backslashes, so the fix is partially correct but incomplete.

Low

Previous suggestions

Suggestions up to commit 1e9fbd5
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid premature closure of directory used by Dictionary

The Dictionary is created inside a try-with-resources block that closes the
NIOFSDirectory immediately after construction. If Dictionary holds a reference to
the directory for deferred I/O (e.g., lazy loading), closing the directory here will
cause failures at use time. The directory should remain open for the lifetime of the
Dictionary, or it must be verified that Dictionary copies all data eagerly before
the directory is closed.

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [324-326]

-try (Directory tmp = new NIOFSDirectory(env.tmpDir())) {
+// Note: NIOFSDirectory must remain open as long as the Dictionary may access it.
+// If Dictionary performs all reads eagerly in the constructor, closing here is safe.
+// Verify Lucene Dictionary behavior before closing the directory.
+Directory tmp = new NIOFSDirectory(env.tmpDir());
+try {
     return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase);
+} catch (Exception e) {
+    IOUtils.closeWhileHandlingException(tmp);
+    throw e;
 }
+// tmp is intentionally not closed here; caller/GC handles it if Dictionary owns it.
Suggestion importance[1-10]: 7

__

Why: Closing the NIOFSDirectory immediately after Dictionary construction could cause failures if Dictionary performs deferred I/O. This is a valid concern that could lead to hard-to-diagnose runtime errors, though the existing getDictionary method in the same class uses the same pattern, suggesting it may be intentional or safe with Lucene's implementation.

Medium
Prevent silent null binding causing runtime injection failure

If hunspellService is null, the TransportHunspellCacheInvalidateAction and
TransportHunspellCacheInfoAction will fail at injection time with a cryptic error.
Either always bind the service (throwing if null) or skip registering the actions
when the service is unavailable. The current silent skip of the binding will cause a
runtime injection failure rather than a clear startup error.

server/src/main/java/org/opensearch/action/ActionModule.java [1129-1131]

 if (hunspellService != null) {
     bind(HunspellService.class).toInstance(hunspellService);
+} else {
+    throw new IllegalStateException("HunspellService is required for HunspellCacheInvalidateAction and HunspellCacheInfoAction");
 }
Suggestion importance[1-10]: 6

__

Why: The null check silently skips binding HunspellService, which would cause a cryptic injection failure at runtime when TransportHunspellCacheInvalidateAction or TransportHunspellCacheInfoAction are invoked. However, the test code explicitly passes null for hunspellService, suggesting this null-tolerance may be intentional for test scenarios, making a hard throw potentially too aggressive.

Low
General
Fix inaccurate concurrent removal count

ConcurrentHashMap.removeIf is not atomic with respect to size(), so the computed
count may be inaccurate in concurrent scenarios. More critically, entries added
between size() and removeIf could cause a negative count. Use an AtomicInteger
counter inside the removeIf predicate to accurately count removed entries.

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [509-511]

-int sizeBefore = dictionaries.size();
-dictionaries.keySet().removeIf(key -> key.startsWith(prefix));
-int count = sizeBefore - dictionaries.size();
+java.util.concurrent.atomic.AtomicInteger count = new java.util.concurrent.atomic.AtomicInteger(0);
+dictionaries.keySet().removeIf(key -> {
+    if (key.startsWith(prefix)) {
+        count.incrementAndGet();
+        return true;
+    }
+    return false;
+});
+int removedCount = count.get();
+if (removedCount > 0) {
+    logger.info("Invalidated {} hunspell dictionary cache entries for package: {}", removedCount, packageId);
+} else {
+    logger.debug("No cached dictionaries found for package: {}", packageId);
+}
+return removedCount;
Suggestion importance[1-10]: 5

__

Why: The sizeBefore - dictionaries.size() approach can yield inaccurate counts in concurrent scenarios, and the code itself acknowledges this is only for logging. Using an AtomicInteger inside removeIf is more accurate, though the impact is limited to diagnostic logging accuracy.

Low
Security
Fix incomplete path traversal validation logic

The path traversal check is incomplete: it misses sequences like foo/.. (without
trailing slash) and Windows-style foo... Since path separators are already rejected
in a subsequent check, the traversal check is partially redundant, but the ordering
means a value like ..foo (starts with ..) passes the startsWith(".") check but not
the equals("..") check. Rely on the path separator rejection first, or use
Path.normalize() comparison to detect traversal more robustly.

server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java [160-167]

-if (value.equals(".")
-    || value.equals("..")
-    || value.contains("./")
-    || value.contains("../")
-    || value.contains("\\.")
-    || value.contains("\\..")
-    || value.startsWith(".")
-    || value.endsWith(".")) {
+// Reject null bytes (security)
+if (value.contains("\0")) {
+    throw new IllegalArgumentException(
+        String.format(Locale.ROOT, "Invalid %s: [%s]. Null bytes are not allowed.", paramName, value)
+    );
+}
 
+// Reject any path separators (Unix and Windows) - must be checked before traversal detection
+if (value.contains("/") || value.contains("\\")) {
+    throw new IllegalArgumentException(
+        String.format(
+            Locale.ROOT,
+            "Invalid %s: [%s]. Path separators (/, \\) are not allowed. "
+                + "Use ref_path for package ID and locale for dictionary locale.",
+            paramName,
+            value
+        )
+    );
+}
+
+// Reject path traversal attempts (no separators at this point, so only bare dots matter)
+if (value.equals(".") || value.equals("..")) {
+    throw new IllegalArgumentException(
+        String.format(Locale.ROOT, "Invalid %s: [%s]. Path traversal sequences (., ..) are not allowed.", paramName, value)
+    );
+}
+
+// Reject cache key separator to prevent cache key injection
+if (value.contains(":")) {
+    throw new IllegalArgumentException(
+        String.format(
+            Locale.ROOT,
+            "Invalid %s: [%s]. Colon (:) is not allowed as it is used as cache key separator.",
+            paramName,
+            value
+        )
+    );
+}
+
Suggestion importance[1-10]: 5

__

Why: The path traversal check has some redundancy and edge cases (e.g., ..foo passes startsWith(".") but the check is startsWith(".") which would catch it). Since path separators are rejected in a subsequent check, the traversal detection is partially redundant, but the suggestion to simplify by checking separators first is a reasonable improvement for clarity and correctness.

Low
Suggestions up to commit 87eab2c
CategorySuggestion                                                                                                                                    Impact
Possible issue
Omit null fields from XContent response output

The toXContent method unconditionally writes null values for package_id, locale, and
cache_key fields, which contradicts the test testResponseToXContentOmitsNullValues
that expects these fields to be absent when null. Null fields should be
conditionally omitted to produce clean JSON output.

server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateResponse.java [83-85]

-builder.field("package_id", packageId);
-builder.field("locale", locale);
-builder.field("cache_key", cacheKey);
+if (packageId != null) builder.field("package_id", packageId);
+if (locale != null) builder.field("locale", locale);
+if (cacheKey != null) builder.field("cache_key", cacheKey);
Suggestion importance[1-10]: 9

__

Why: The toXContent method unconditionally writes null values for package_id, locale, and cache_key, which directly contradicts the test testResponseToXContentOmitsNullValues that asserts these fields should be absent when null. This is a real bug that will cause test failures.

High
Fix constructor signature mismatch with test usage

The RestHunspellCacheInvalidateAction constructor takes no arguments, but the test
file RestHunspellCacheInvalidateActionTests instantiates it with a HunspellService
argument (new RestHunspellCacheInvalidateAction(mockService)). This mismatch will
cause a compilation error in the tests. The constructor should accept a
HunspellService parameter to match the test expectations, even if it's not used
directly (since the service is accessed via the transport layer).

server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java [56-57]

-public RestHunspellCacheInvalidateAction() {
+public RestHunspellCacheInvalidateAction(HunspellService hunspellService) {
 }
Suggestion importance[1-10]: 8

__

Why: The test RestHunspellCacheInvalidateActionTests instantiates RestHunspellCacheInvalidateAction with a HunspellService argument, but the actual constructor takes no arguments. This mismatch will cause a compilation error in the tests.

Medium
Pass HunspellService to REST handler constructor

RestHunspellCacheInvalidateAction is instantiated with no arguments in ActionModule,
but the test expects a constructor that takes a HunspellService. Additionally,
ActionModule now holds a hunspellService field that should be passed to the REST
handler if the constructor is updated. This ensures the handler is properly
initialized and consistent with the test expectations.

server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/ActionModule.java [1091]

-registerHandler.accept(new RestHunspellCacheInvalidateAction());
+registerHandler.accept(new RestHunspellCacheInvalidateAction(hunspellService));
Suggestion importance[1-10]: 7

__

Why: The file referenced in the suggestion (ActionModule.java) is incorrect (it's actually in server/src/main/java/org/opensearch/action/ActionModule.java), but the actual code change at line 1091 in ActionModule.java does instantiate RestHunspellCacheInvalidateAction() without arguments. If the constructor is updated to accept HunspellService, this call site must also be updated to pass hunspellService.

Medium
General
Ensure affix stream opened before dictionary streams

If an exception is thrown while opening one of the dicStreams (e.g., the second
file), previously opened streams in dicStreams are not closed until the finally
block. However, if affixStream is never assigned (exception before line affixStream
= Files.newInputStream(...)), IOUtils.close(affixStream) is called with null, which
is safe. The real risk is that IOUtils.close(dicStreams) in the finally block
correctly closes all opened streams. This is safe, but consider opening affixStream
before the loop to make the resource management clearer and consistent with the
existing loadDictionary pattern.

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [305-324]

-InputStream affixStream = null;
+InputStream affixStream = Files.newInputStream(affixFiles[0]);
 List<InputStream> dicStreams = new ArrayList<>(dicFiles.length);
 
 try {
     for (Path dicFile : dicFiles) {
         dicStreams.add(Files.newInputStream(dicFile));
     }
-    affixStream = Files.newInputStream(affixFiles[0]);
 
     try (Directory tmp = new NIOFSDirectory(env.tmpDir())) {
         return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase);
     }
 } catch (Exception e) {
     logger.error(() -> new ParameterizedMessage(
         "Could not load hunspell dictionary from package [{}] locale [{}]", packageId, locale), e);
     throw e;
 } finally {
     IOUtils.close(affixStream);
     IOUtils.close(dicStreams);
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion proposes reordering stream initialization for clarity, but the current code is functionally correct since IOUtils.close handles null safely and closes all opened streams in the finally block. This is a minor style/consistency improvement with low impact.

Low
Suggestions up to commit 3926f7a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Omit null fields from XContent serialization

The toXContent method always writes package_id, locale, and cache_key fields even
when they are null, which will serialize them as JSON null values. The test
testResponseToXContentOmitsNullValues explicitly expects these fields to be absent
when null. Add null checks before writing these fields.

server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateResponse.java [83-85]

-builder.field("package_id", packageId);
-builder.field("locale", locale);
-builder.field("cache_key", cacheKey);
+if (packageId != null) builder.field("package_id", packageId);
+if (locale != null) builder.field("locale", locale);
+if (cacheKey != null) builder.field("cache_key", cacheKey);
Suggestion importance[1-10]: 9

__

Why: The toXContent method always writes package_id, locale, and cache_key even when null, but the test testResponseToXContentOmitsNullValues explicitly asserts these fields are absent when null. This is a real bug that will cause test failures.

High
Fix constructor signature mismatch with tests

The RestHunspellCacheInvalidateAction constructor takes no arguments, but the test
file instantiates it with a HunspellService parameter (new
RestHunspellCacheInvalidateAction(mockService)). This mismatch will cause a
compilation error. The constructor should accept a HunspellService parameter to
match the test usage, or the tests are testing a different version of the class.

server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java [56-57]

-public RestHunspellCacheInvalidateAction() {
+public RestHunspellCacheInvalidateAction(HunspellService hunspellService) {
 }
Suggestion importance[1-10]: 8

__

Why: The test file RestHunspellCacheInvalidateActionTests instantiates RestHunspellCacheInvalidateAction with a HunspellService argument, but the actual constructor takes no arguments. This mismatch will cause a compilation error.

Medium
Pass HunspellService to REST handler constructor

RestHunspellCacheInvalidateAction is instantiated without arguments here, but the
test expects a HunspellService constructor parameter. Additionally, the
hunspellService field is available in ActionModule and should be passed to the
handler so it can be used for direct local operations if needed.

server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/ActionModule.java [1091]

 // Hunspell cache management API
-registerHandler.accept(new RestHunspellCacheInvalidateAction());
+registerHandler.accept(new RestHunspellCacheInvalidateAction(hunspellService));
Suggestion importance[1-10]: 7

__

Why: This is directly related to suggestion 2 — if the constructor is fixed to accept HunspellService, the registration in ActionModule must also pass hunspellService. The hunspellService field is available in ActionModule and should be passed here.

Medium
General
Fix stream opening order for proper resource cleanup

The affixStream is closed in the finally block after being passed to the Dictionary
constructor, but the Dictionary constructor may read from the stream during
construction. If the stream is closed before the Dictionary finishes using it (or if
the Directory is closed before the Dictionary is fully initialized), it could cause
issues. More critically, if an exception occurs while opening one of the dicStreams,
the already-opened streams won't be closed until the finally block, which is
correct, but the affixStream is opened after all dicStreams, so if opening
affixStream fails, dicStreams are still properly closed in finally. The logic is
mostly correct but the affixStream should be opened before the loop or the order
should be consistent.

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [305-324]

 InputStream affixStream = null;
 List<InputStream> dicStreams = new ArrayList<>(dicFiles.length);
 
 try {
+    affixStream = Files.newInputStream(affixFiles[0]);
     for (Path dicFile : dicFiles) {
         dicStreams.add(Files.newInputStream(dicFile));
     }
-    affixStream = Files.newInputStream(affixFiles[0]);
 
     try (Directory tmp = new NIOFSDirectory(env.tmpDir())) {
         return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase);
     }
 } catch (Exception e) {
     logger.error(() -> new ParameterizedMessage(
         "Could not load hunspell dictionary from package [{}] locale [{}]", packageId, locale), e);
     throw e;
 } finally {
     IOUtils.close(affixStream);
     IOUtils.close(dicStreams);
 }
Suggestion importance[1-10]: 3

__

Why: The suggested reordering (opening affixStream before dicStreams) is a minor style/consistency improvement. The existing code is functionally correct since all streams are closed in the finally block regardless of order, so this is a low-impact change.

Low
Suggestions up to commit e84b5bc
CategorySuggestion                                                                                                                                    Impact
Possible issue
Align conditional registration of transport actions and REST handler

The REST handler for Hunspell cache management is only registered when
hunspellService is non-null, but the transport actions HunspellCacheInfoAction and
HunspellCacheInvalidateAction are always registered unconditionally. If
hunspellService is null, the transport actions will be registered but will fail at
injection time since HunspellService won't be bound in Guice. Either always register
the handler (and let the transport action handle a null service gracefully), or
guard the transport action registration with the same null check.

server/src/main/java/org/opensearch/action/ActionModule.java [1091-1093]

-// Hunspell cache management API
+actions.register(HunspellCacheInfoAction.INSTANCE, TransportHunspellCacheInfoAction.class);
+actions.register(HunspellCacheInvalidateAction.INSTANCE, TransportHunspellCacheInvalidateAction.class);
+// ... and in configure():
 if (hunspellService != null) {
-    registerHandler.accept(new RestHunspellCacheInvalidateAction(hunspellService));
+    bind(HunspellService.class).toInstance(hunspellService);
+    // register REST handler only when service is available
 }
+// Consider making HunspellService always non-null instead
Suggestion importance[1-10]: 7

__

Why: This is a real inconsistency: transport actions HunspellCacheInfoAction and HunspellCacheInvalidateAction are always registered unconditionally, but HunspellService is only bound in Guice when non-null, which would cause injection failures at runtime. This mismatch could lead to hard-to-diagnose errors.

Medium
Fix potential stream resource leak on failure

The affixStream is closed in the finally block, but it has already been consumed and
potentially closed by the Dictionary constructor. More critically, if
Files.newInputStream(affixFiles[0]) throws after some dicStreams have been opened,
those streams won't be closed since affixStream is still null and
IOUtils.close(affixStream) is a no-op. The dicStreams will be closed by
IOUtils.close(dicStreams), but the affix stream opened inside the try block may leak
if the Dictionary constructor fails after consuming it. Ensure all streams are
properly managed, ideally using try-with-resources for each stream.

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [305-324]

-InputStream affixStream = null;
 List<InputStream> dicStreams = new ArrayList<>(dicFiles.length);
-
 try {
     for (Path dicFile : dicFiles) {
         dicStreams.add(Files.newInputStream(dicFile));
     }
-    affixStream = Files.newInputStream(affixFiles[0]);
-
-    try (Directory tmp = new NIOFSDirectory(env.tmpDir())) {
-        return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase);
+    try (InputStream affixStream = Files.newInputStream(affixFiles[0])) {
+        try (Directory tmp = new NIOFSDirectory(env.tmpDir())) {
+            return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase);
+        }
     }
 } catch (Exception e) {
     logger.error(() -> new ParameterizedMessage(
         "Could not load hunspell dictionary from package [{}] locale [{}]", packageId, locale), e);
     throw e;
 } finally {
-    IOUtils.close(affixStream);
     IOUtils.close(dicStreams);
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that affixStream could leak if the Dictionary constructor fails, and using try-with-resources for affixStream is cleaner. However, the Dictionary constructor typically consumes and closes streams internally, and IOUtils.close in the finally block handles the null case, so the actual risk is moderate.

Low
Avoid computeIfAbsent with exception-throwing function

ConcurrentHashMap.computeIfAbsent does not allow the mapping function to throw
checked exceptions, but if the function throws an unchecked exception (like
IllegalStateException), the key will NOT be inserted into the map. However, a subtle
issue exists: if the load fails with an IllegalStateException, subsequent calls will
retry loading (which is correct), but the wrapping loses the original exception
type. More critically, computeIfAbsent can deadlock if called recursively on the
same key. Consider using a separate get-then-put pattern or documenting this
limitation clearly.

server/src/main/java/org/opensearch/indices/analysis/HunspellService.java [180-191]

-return dictionaries.computeIfAbsent(cacheKey, (key) -> {
-    try {
-        return loadDictionaryFromPackage(packageId, locale);
-    } catch (Exception e) {
+Dictionary existing = dictionaries.get(cacheKey);
+if (existing != null) {
+    return existing;
+}
+Dictionary loaded;
+try {
+    loaded = loadDictionaryFromPackage(packageId, locale);
+} catch (Exception e) {
+    throw new IllegalStateException(
+        String.format(Locale.ROOT,
+            "Failed to load hunspell dictionary for package [%s] locale [%s]",
+            packageId, locale), e);
+}
+Dictionary previous = dictionaries.putIfAbsent(cacheKey, loaded);
+return previous != null ? previous : loaded;
 
-        throw new IllegalStateException(
-            String.format(Locale.ROOT, 
-                "Failed to load hunspell dictionary for package [%s] locale [%s]", 
-                packageId, locale), e);
-    }
-});
-
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about computeIfAbsent with exception-throwing lambdas and potential recursive deadlock, but the existing code already wraps checked exceptions in IllegalStateException (unchecked), so the checked exception issue doesn't apply. The putIfAbsent pattern is a reasonable alternative but the improvement is moderate.

Low
General
Remove side effects from validation method

The validate() method has a side effect of mutating the object's fields (packageId,
locale, cacheKey) by trimming them. This is unexpected behavior for a validation
method and can cause issues if validate() is called multiple times or if the object
is inspected before and after validation. Trimming should be done in the setters or
in the transport action, not in validate().

server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateRequest.java [86-89]

-// Update internal state with trimmed values
-this.packageId = trimmedPackageId;
-this.locale = trimmedLocale;
-this.cacheKey = trimmedCacheKey;
+// Do not mutate fields in validate(); trimming should be done in setters or transport layer.
+// Remove the three lines that reassign this.packageId, this.locale, this.cacheKey
Suggestion importance[1-10]: 5

__

Why: Mutating object state inside validate() is indeed a design smell and can cause subtle bugs if validate() is called multiple times. The improved_code doesn't show concrete replacement code, making it less actionable, but the concern is valid.

Low

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

❌ Gradle check result for e84b5bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

- Add ref_path parameter for package-based dictionary loading
- Load from config/packages/{packageId}/hunspell/{locale}/
- Add cache invalidation REST API (GET/POST /_hunspell/cache/_invalidate)
- Add TransportAction with cluster:admin permission
- Add comprehensive security validation (path traversal, separators, cache-key injection)
- Add updateable flag for hot-reload via _reload_search_analyzers
- Add comprehensive test coverage

PR feedback addressed:
- Stricter validate() to reject conflicting params
- Path traversal checks now use config/packages/ as base
- ref_path/locale validation rejects ., .., /, \, : characters

Signed-off-by: shayush622 <ayush5267@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Persistent review updated to latest commit 3926f7a

@shayush622 shayush622 requested a review from cwperks March 4, 2026 16:30
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

❌ Gradle check result for 3926f7a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit 87eab2c

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

❌ Gradle check result for 87eab2c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: shayush622 <ayush5267@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit 1e9fbd5

- Add ref_path parameter for package-based dictionary loading
- Load from config/packages/{packageId}/hunspell/{locale}/
- Add cache info API: GET /_hunspell/cache (cluster:monitor/hunspell/cache)
- Add cache invalidation API: POST /_hunspell/cache/_invalidate (cluster:admin/hunspell/cache/invalidate)
- Support invalidation by package_id, locale, cache_key, or invalidate_all
- Add security validation (path traversal, separator injection, null bytes)
- Add updateable flag for hot-reload via _reload_search_analyzers
- Use Strings.hasText() and Strings.isNullOrEmpty() for validation consistency
- Consistent response schema with all fields always present
- Add unit tests, REST handler tests, and integration tests

Signed-off-by: shayush622 <ayush5267@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Persistent review updated to latest commit 20ba2c0

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

❌ Gradle check result for 20ba2c0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@shayush622
Copy link
Contributor Author

Superseded by #20792 — recreated with clean history

@shayush622 shayush622 closed this Mar 6, 2026
@github-actions github-actions bot mentioned this pull request Mar 6, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants