Skip to content

Conversation

@badmonster0
Copy link
Member

Originated from #573, but this is a overall redesign of the SplitRecursively function.

  • Allow users to specify a min_chunk_size
  • Holistically planning the way of chunking to minimize "cost", considering the following factors
    • AST structural level
    • Literal styles (new lines, double new lines)
    • Efficiency of overlap leverage
  • Also for non-dividable elements (e.g. large comments, large strings), fallback to regexp based text chunking

@badmonster0 badmonster0 merged commit b2d6de5 into main May 30, 2025
8 checks passed
@badmonster0 badmonster0 deleted the exp-splitter-dp branch May 31, 2025 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants