Skip to content

fix: preserve hierarchical indexing estimate rules#37025

Open
astordu wants to merge 1 commit into
langgenius:mainfrom
astordu:fix-hierarchical-indexing-estimate
Open

fix: preserve hierarchical indexing estimate rules#37025
astordu wants to merge 1 commit into
langgenius:mainfrom
astordu:fix-hierarchical-indexing-estimate

Conversation

@astordu
Copy link
Copy Markdown

@astordu astordu commented Jun 3, 2026

Summary

  • Preserve parent-child indexing estimate fields during request validation
  • Keep parent_mode and subchunk_segmentation when normalizing process rules
  • Add a regression test for hierarchical indexing estimate validation

Problem

When creating a knowledge base with parent-child chunking, the preview endpoint can return zero chunks even though the uploaded document is parsed correctly. The request payload includes parent_mode and subchunk_segmentation, but DocumentService.estimate_args_validate rebuilds process_rule from _EstimateRules, which previously only allowed pre_processing_rules and segmentation. As a result, parent-child specific fields are dropped before indexing estimation.

Test

  • python -m pytest -o addopts='' tests/unit_tests/services/test_dataset_service_document.py -k estimate_args_validate_preserves_hierarchical_chunking_rules -q

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants