Kafka Text Embedding Inference

Note: This project is part of my bachelor thesis at the University of Vienna. The thesis and presentation can be found in the documents directory.

A streamlined Java library for building Kafka streaming pipelines that generate text embeddings using huggingface/text-embeddings-inference (TEI) - a highly optimized inference service.

This library handles the data preparation phase of typical Retrieval Augmented Generation (RAG) pipelines in Kafka by efficiently processing and embedding text at scale.

Consume messages from Kafka
Batch messages for optimal processing
Split messages into chunks
Generate embeddings using the TEI gRPC API
Produce enriched messages back to Kafka

Features

Efficient Message Batching: Optimizes throughput when embedding messages from Kafka
Configurable Pipeline: Easy setup with CLI options for all necessary configurations
Customizable Chunking: Flexible chunking stage
Message Serialization: Support for different serialization formats
TEI Service Integration: Direct integration with huggingface/text-embeddings-inference via gRPC

Prerequisites

Running Kafka cluster with topics for input and output
Running huggingface/text-embeddings-inference service

Configuration

--batch-size         # Kafka batch size (default: 1)
--bootstrap-server   # Kafka bootstrap server address
--schema-registry    # Schema registry URL
--input-topic        # Kafka input topic
--output-topic       # Kafka output topic
--tei-host           # TEI service host
--tei-port           # TEI service port (default: 50051)

Demo setup: paper-inference-app

An example implementation that demonstrates how to use the library to:

Process academic papers from Kafka
Naively chunk paper abstracts for embedding generation
Produce embedded papers to Kafka in a format compatible with Qdrant sink connector

It also includes a demo environment using Docker Compose to showcase the pipeline in action.

Prerequisites:

Docker and Docker Compose installed

Start all services by running the command for your system architecture:
arm64 (e.g. Apple Silicon):

docker compose -f paper-inference-app/demo/docker-compose.yaml -f paper-inference-app/demo/docker-compose.arm64.yaml up -d

x86/amd64:

docker compose -f paper-inference-app/demo/docker-compose.yaml up -d

Note: Startup may take a few minutes until all services are available

This command launches:
A fully initialised Kafka cluster with topics for input and output
A producer that automatically fetches and produces papers from Europe PMC API
The huggingface/text-embeddings-inference service for generating embeddings
Redpanda Console for monitoring the pipeline
Qdrant Vector Database for storing embedded papers
Configured Kafka Connect instance with Qdrant sink connector that automatically writes embedded papers to Qdrant
The showcased paper-inference-app itself - used for processing and embedding papers

The Redpanda Console dashboard is available at http://localhost:8080
This interface can be used to observe messages flowing through the pipeline, monitor topics, and inspect messages in real-time.
The Qdrant dashboard is available at http://localhost:6333/dashboard

Overview of the services running in the demo setup, when fully initialized:

The Redpanda Console dashboard provides insight into the Kafka workflow. The screenshot shows the two main topics used by the demo application. The inference-test-paper topic contains the input messages for the demo application. The inference-test-embedded-paper topic contains the output messages written by the demo application.

The inference-test-paper input topic, which contains papers written by the paper-producer producer application. This application downloads and produces papers from Europe PMC.

This screenshot shows the output topic inference-test-embedded-paper. These messages are the output written by the demo application. They contain the vector embedding along with information from the input message and are in a JSON format compatible with the Qdrant vector database.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
docs/images		docs/images
documents		documents
experiments		experiments
gradle		gradle
inference-pipeline		inference-pipeline
paper-inference-app		paper-inference-app
.gitignore		.gitignore
README.md		README.md
build.gradle.kts		build.gradle.kts
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka Text Embedding Inference

Features

Prerequisites

Configuration

Demo setup: paper-inference-app

Prerequisites:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kafka Text Embedding Inference

Features

Prerequisites

Configuration

Demo setup: paper-inference-app

Prerequisites:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages