-
Notifications
You must be signed in to change notification settings - Fork 168
Improve data store handling with thread safety and resource management enhancements #2954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve data store handling with thread safety and resource management enhancements #2954
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request enhances thread safety and error handling in the data store implementation. The changes focus on ensuring safe concurrent access to shared state and improving robustness when dealing with plugin files.
Key changes:
- Introduced thread-safe collections (
ConcurrentHashMap,CopyOnWriteArrayList) and volatile variables to protect shared state across threads - Added null checks and graceful error handling for missing plugin files and configuration files
- Fixed import statement to use the correct
StringUtilpackage
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| FileListIndexUpdateCallbackImpl.java | Changed deleteUrlList to CopyOnWriteArrayList for thread safety and corrected StringUtil import |
| DataStoreFactory.java | Converted dataStoreMap to ConcurrentHashMap, made dataStoreNames and lastLoadedTime volatile, synchronized getDataStoreNames method, and added null/existence checks for plugin files |
| AbstractDataStore.java | Made alive flag volatile for thread visibility and relocated the default constructor |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** List of URLs to be deleted, cached for batch processing. Thread-safe using CopyOnWriteArrayList. */ | ||
| protected List<String> deleteUrlList = new CopyOnWriteArrayList<>(); |
Copilot
AI
Nov 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using CopyOnWriteArrayList for deleteUrlList may cause performance issues if URLs are frequently added. This collection creates a copy of the underlying array on every write operation, which could be expensive for batch processing scenarios. Consider using a synchronized ArrayList or Collections.synchronizedList() if the list is primarily accessed within synchronized blocks during batch operations.
| /** List of URLs to be deleted, cached for batch processing. Thread-safe using CopyOnWriteArrayList. */ | |
| protected List<String> deleteUrlList = new CopyOnWriteArrayList<>(); | |
| /** List of URLs to be deleted, cached for batch processing. Thread-safe using Collections.synchronizedList. */ | |
| protected List<String> deleteUrlList = Collections.synchronizedList(new ArrayList<>()); |
| * Thread-safe implementation using ConcurrentHashMap for concurrent access. | ||
| */ | ||
| protected Map<String, DataStore> dataStoreMap = new LinkedHashMap<>(); | ||
| protected Map<String, DataStore> dataStoreMap = new ConcurrentHashMap<>(); |
Copilot
AI
Nov 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching from LinkedHashMap to ConcurrentHashMap removes insertion-order guarantees. If any code depends on the iteration order of data stores, this change could introduce subtle bugs. Verify that insertion order is not required for this map's usage.
…t enhancements This commit addresses several improvements in the data store processing implementation: **Thread Safety Improvements:** - DataStoreFactory: Changed dataStoreMap from LinkedHashMap to ConcurrentHashMap for thread-safe concurrent access - DataStoreFactory: Made dataStoreNames and lastLoadedTime volatile to ensure visibility across threads - DataStoreFactory: Made getDataStoreNames() synchronized to prevent race conditions during cache refresh - AbstractDataStore: Made alive field volatile to ensure proper visibility in multi-threaded scenarios - FileListIndexUpdateCallbackImpl: Changed deleteUrlList from ArrayList to CopyOnWriteArrayList for thread-safe access from executor threads **Resource Handling:** - DataStoreFactory: Added null check for jarFiles array to prevent NullPointerException - DataStoreFactory: Added existence check for XML configuration files before attempting to read them, reducing unnecessary exception handling **Code Quality:** - AbstractDataStore: Moved static logger declaration before instance fields following Java best practices - FileListIndexUpdateCallbackImpl: Fixed incorrect StringUtil import (was using Apache POI's StringUtil instead of Fess's) These improvements enhance the reliability and thread safety of the data store processing layer, particularly important for concurrent crawling operations.
Changed deleteUrlList from CopyOnWriteArrayList to ArrayList since all access is already protected by synchronized(indexUpdateCallback) blocks. CopyOnWriteArrayList creates a copy on every write operation which is unnecessary overhead when synchronization is already in place. Changes: - Replaced CopyOnWriteArrayList with ArrayList for deleteUrlList - Added synchronized block in commit() method for consistency - Updated javadoc to clarify synchronization strategy This improves performance for batch delete operations by eliminating unnecessary array copying on each URL addition.
Fixed two failing tests that were using 'echo' commands which complete almost instantly, making it impossible to verify process running state: 1. test_startProcess_replaceExistingProcess: - Changed from 'echo' to 'sleep 10' commands - Added proper wait time before checking process state - Added verification that only one process exists for the session 2. test_destroyProcess_withRunningProcess: - Changed from 'echo' to 'sleep 10' command - Simplified polling logic - just wait 100ms for process to start - Updated exit code assertion (forcibly destroyed processes may have non-zero exit codes) These changes ensure the tests can reliably verify that processes are running before attempting to check their state or destroy them.
Added extensive test coverage for the thread safety improvements made to the data store handling layer. DataStoreFactoryTest additions: - test_add_concurrentAccess: Verifies ConcurrentHashMap handles concurrent additions - test_getDataStore_concurrentWithAdd: Tests concurrent read/write operations - test_getDataStoreNames_concurrentAccess: Verifies synchronized method correctness - test_volatileFields_visibility: Ensures volatile fields are visible across threads - test_cacheRefresh_withConcurrentReads: Validates cache refresh with concurrent access - test_nullSafety_concurrentAccess: Tests null handling under concurrent load AbstractDataStoreTest additions: - test_aliveField_volatileVisibility: Verifies volatile alive field visibility - test_stop_volatileVisibility: Tests stop() method sets alive and is visible - test_stop_concurrentAccess: Validates concurrent calls to stop() - test_aliveField_concurrentReadWrite: Tests concurrent read/write of alive field - test_getName_concurrentAccess: Verifies getName() is thread-safe FileListIndexUpdateCallbackImplTest additions: - test_deleteUrlList_synchronizedAccess: Validates ArrayList with synchronized blocks - test_deleteUrlList_concurrentReads: Tests concurrent reads from deleteUrlList - test_deleteUrlList_clearOperation: Verifies clear() operation thread safety - test_deleteUrlList_iteration: Tests iteration with proper synchronization - test_deleteUrlList_isEmptyCheck: Validates isEmpty() check under concurrent access These tests verify: 1. ConcurrentHashMap correctly handles concurrent operations in DataStoreFactory 2. Volatile fields ensure proper visibility across threads 3. Synchronized methods prevent race conditions during cache refresh 4. ArrayList works correctly when all access is properly synchronized 5. No ConcurrentModificationException occurs in any concurrent scenario All tests use minimal or no sleep to maintain fast test execution time.
Based on feedback that dataStoreMap is not accessed by multiple threads, reverted the ConcurrentHashMap change back to LinkedHashMap. Changes: - Reverted dataStoreMap from ConcurrentHashMap to LinkedHashMap - Removed ConcurrentHashMap import - Removed dataStoreMap-related concurrency tests: * test_add_concurrentAccess * test_getDataStore_concurrentWithAdd * test_nullSafety_concurrentAccess Kept volatile fields and synchronized getDataStoreNames(): - lastLoadedTime and dataStoreNames remain volatile (for cache visibility) - getDataStoreNames() remains synchronized (for cache refresh safety) - Related tests for getDataStoreNames() concurrency are maintained The dataStoreMap is accessed in single-threaded context during application initialization and data store registration.
3e6abba to
ffe107b
Compare
This pull request introduces several improvements to thread safety and error handling in the data store and index update callback implementations. The most significant changes involve switching to thread-safe collections and variables, adding synchronization and volatile keywords for shared state, and improving robustness when loading plugin files.
Thread safety improvements:
dataStoreMapinDataStoreFactoryto useConcurrentHashMapfor safe concurrent access.dataStoreNamesandlastLoadedTimeinDataStoreFactoryvolatileto ensure visibility across threads, and synchronized thegetDataStoreNamesmethod. [1] [2]aliveflag inAbstractDataStoreto bevolatilefor safe multi-threaded access.deleteUrlListinFileListIndexUpdateCallbackImplto useCopyOnWriteArrayListfor thread-safe batch processing.Error handling and robustness:
loadDataStoreNameListinDataStoreFactoryto handlenullplugin JAR files and missing configuration files gracefully, with debug logging for missing files.Dependency and import fixes:
StringUtilinFileListIndexUpdateCallbackImplto use the correct package.CopyOnWriteArrayListinFileListIndexUpdateCallbackImpl.Code organization:
AbstractDataStoreto a more logical location in the file. [1] [2]