This is an n8n community node. It lets you use semantic text splitting in your n8n workflows.
Semantic text splitting is an advanced text processing technique that splits text based on semantic similarity rather than simple character or token counts. This node implements the methodology described in 5 Levels Of Text Splitting, which uses embedding models to identify natural breakpoints in text where meaning shifts significantly.
n8n is a fair-code licensed workflow automation platform.
Installation
Operations
Compatibility
Usage
Resources
Version history
Follow the installation guide in the n8n community nodes documentation.
Quick Installation:
- Make sure to allow community nodes with
N8N_COMMUNITY_PACKAGES_ENABLED=true
- Once logged in to your N8N web UI, go to
/settings/community-nodes
and type@bitovi/n8n-nodes-semantic-text-splitter
The Semantic Text Splitter node provides intelligent text splitting capabilities:
- Semantic Text Splitting: Splits text into semantically coherent chunks using embedding-based similarity analysis
- Configurable Parameters: Control splitting behavior with customizable thresholds and window sizes
- Integration with Embedding Models: Works with any n8n-compatible embedding model for similarity calculations
- Minimum n8n version: Compatible with n8n v1.0+
- Node.js version: Requires Node.js 18.10 or higher
- Dependencies: Requires @langchain/core and @langchain/textsplitters
This node is tested with the latest stable version of n8n and should work with most n8n installations that support community nodes.
The Semantic Text Splitter node requires an embedding model connection and supports the following parameters:
- Percentile Threshold for Breakpoints (default: 0.95): Controls how selective the algorithm is when identifying breakpoints. Higher values result in fewer, larger chunks.
- Number of Sentences to Consider in Sliding Window (default: 3): The size of the sliding window used for embedding comparison.
- Minimum Chunk Size (default: 100): Minimum character length for generated chunks.
- Sentence Delimiters (default: ".!?"): Characters used to identify sentence boundaries.
- Sentence Splitting: Text is first split into sentences using the configured delimiters
- Sliding Windows: Creates overlapping windows of sentences based on the window size
- Embedding Generation: Each window is converted to embeddings using the connected embedding model
- Distance Calculation: Calculates cosine distances between sequential window embeddings
- Breakpoint Detection: Identifies breakpoints where semantic distance exceeds the threshold
- Chunk Creation: Creates final text chunks based on identified breakpoints
- Connect a text input (e.g., from a document loader)
- Connect an embedding model (e.g., OpenAI Embeddings)
- Configure the splitting parameters based on your needs
- Connect the output to downstream processing nodes (e.g., vector store, document processing)
This approach creates more coherent chunks compared to simple character or token-based splitting, making it ideal for RAG (Retrieval Augmented Generation) applications and document processing workflows.
- n8n community nodes documentation
- 5 Levels Of Text Splitting Tutorial
- LangChain Text Splitters Documentation
- v0.1.2 (Current): Latest stable release with full semantic splitting functionality
- v0.1.x: Initial releases with core semantic text splitting features
Need guidance on leveraging AI agents or N8N for your business? Our AI Agents workshop will equip you with the knowledge and tools necessary to implement successful and valuable agentic workflows.