Splits the text based on semantic similarity. Check the official LangChain documentation page
breakpoint_threshold_type must be one between:
- "percentile"
- "standard_deviation"
- "interquartile"
the recommended values by langchain for breakpoint_threshold_amount are:
"percentile": 95
"standard_deviation": 3
"interquartile": 1.5
NOTE: The SemanticChunker is an experimental component in LangChain, it couldn't work or produce some errors.
