Skip to content

Conversation

@marevol
Copy link
Contributor

@marevol marevol commented Nov 17, 2025

Major improvements:

  • Extracted common filter processing logic into reusable methods (processFilters, addFailureExclusionFilters)
  • Extracted crawler setup logic into dedicated methods (setupWebCrawler, setupFileCrawler)
  • Extracted crawler execution logic into executeCrawlers method
  • Reduced code duplication by ~250 lines

Code quality improvements:

  • Fixed incorrect break statements (should be continue)
  • Removed unused code (ComponentUtil.getFessConfig() call)
  • Simplified AtomicBoolean usage in filter processing
  • Improved error messages with config names
  • Fixed issue where session IDs were added even for invalid configs

The doCrawl method is now much more readable and maintainable, going from ~380 lines to ~65 lines while preserving all functionality.

Major improvements:
- Extracted common filter processing logic into reusable methods
  (processFilters, addFailureExclusionFilters)
- Extracted crawler setup logic into dedicated methods
  (setupWebCrawler, setupFileCrawler)
- Extracted crawler execution logic into executeCrawlers method
- Reduced code duplication by ~250 lines

Code quality improvements:
- Fixed incorrect break statements (should be continue)
- Removed unused code (ComponentUtil.getFessConfig() call)
- Simplified AtomicBoolean usage in filter processing
- Improved error messages with config names
- Fixed issue where session IDs were added even for invalid configs

The doCrawl method is now much more readable and maintainable,
going from ~380 lines to ~65 lines while preserving all functionality.
@marevol marevol requested a review from Copilot November 17, 2025 20:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the WebFsIndexHelper class by extracting the lengthy doCrawl method into smaller, focused methods. The main changes improve code maintainability by eliminating ~250 lines of duplicated logic between web and file crawling, while also fixing several bugs in control flow and error handling.

Key changes:

  • Extracted filter processing logic into reusable processFilters and addFailureExclusionFilters methods
  • Created dedicated setup methods (setupWebCrawler, setupFileCrawler) and a common configureCrawler method
  • Fixed control flow bugs (incorrect break statements changed to continue via early returns)
  • Improved error messages to include config names for better debugging

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Added 28 new test methods covering the refactored functionality:

New method test coverage:
- processFilters: 10 tests covering null/empty strings, include/exclude
  filters, comments, #DISABLE_URL_ENCODE directive, special characters
- addFailureExclusionFilters: 4 tests covering null/empty lists, URLs,
  duplicates, and blank URLs
- configureCrawler: 3 tests covering parameter settings, unlimited depth,
  default values, and custom priority
- handleCleanupConfig: 4 tests covering cleanup modes (all, url filters,
  none, empty map)
- setupWebCrawler/setupFileCrawler: 2 tests for blank URL/path validation

Test scenarios include:
- Edge cases (null, empty, whitespace-only inputs)
- Valid inputs with various configurations
- URL encoding behavior with special directive
- Comment line filtering
- Duplicate handling
- Special character handling
- Multiple filter types (URL and Path)

Total test methods: 58 (increased from 30)
Total test file size: 804 lines (increased from 437)
Added mockito-core 4.11.0 as a test dependency to support the
comprehensive unit tests added for WebFsIndexHelper.

The tests use Mockito for mocking crawler components, system helpers,
and other dependencies to enable isolated unit testing of the
refactored methods.
Replaced Mockito mocks with anonymous inner classes for
CrawlingConfigHelper registration, following the pattern used in
DataIndexHelperTest. The Lasta Di container requires components to
be registered as actual class instances, not Mockito mocks.

This fixes the ComponentNotFound errors in:
- test_addFailureExclusionFilters_withNullList
- test_addFailureExclusionFilters_withEmptyList
- test_addFailureExclusionFilters_withUrls
- test_addFailureExclusionFilters_withDuplicates
- test_addFailureExclusionFilters_withBlankUrls
Created a test stub implementation of CrawlingConfigHelper that can
be properly registered in the Lasta Di container and configured for
different test scenarios.

Changes:
- Created TestCrawlingConfigHelper.java with configurable behavior
  for testing
- Registered TestCrawlingConfigHelper in test_app.xml as a singleton
  component
- Updated test methods to get the component from DI container and
  configure it per test instead of trying to register new instances

This fixes the ComponentNotFound errors in:
- test_addFailureExclusionFilters_withNullList
- test_addFailureExclusionFilters_withEmptyList
- test_addFailureExclusionFilters_withUrls
- test_addFailureExclusionFilters_withDuplicates
- test_addFailureExclusionFilters_withBlankUrls

The DI container now properly resolves the crawlingConfigHelper
component when ComponentUtil.getCrawlingConfigHelper() is called.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants