Semantic Chunking Chunk Size Bug

Llamaindex's `SemanticSplitterNodeParser` can sometimes produce chunks that are too large for the embedding model. Unfortunately there is no max length option for the semantic chunking to avoid this issue. 

Will have to eventually subclass the `SemanticSplitterNodeParser` and create a two level safety net that will naively split large chunks into sub-chunks in order to stay under the embedding model input token limits.

Reference:
https://github.com/run-llama/llama_index/issues/12270

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Semantic Chunking Chunk Size Bug #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Semantic Chunking Chunk Size Bug #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions