Skip to content

osodevops/kafka-streams-using-topic-naming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kafka Streams Topic Naming Example

A comprehensive example demonstrating explicit topic naming conventions for Kafka Streams applications, including changelog, repartition, and windowed state store topics.

Overview

This project implements a time-windowed aggregation Kafka Streams application that showcases:

  • Explicit changelog topic naming using Materialized.as()
  • Explicit repartition topic naming using Grouped.as()
  • Predictable topic names following organizational standards
  • Manual topic creation with auto-creation disabled
  • Time-windowed aggregation with 5-minute tumbling windows
  • Comprehensive testing with TopologyTestDriver and Testcontainers

Key Features

PRD Compliance

This implementation fully complies with the PRD requirements:

  • ✅ All topics use explicit, predictable naming conventions
  • ✅ Auto topic creation is disabled (auto.create.topics.enable=false)
  • ✅ Topic names follow organizational pattern: {{domain_id}}-{{environment}}-{{accessibility}}-{{service}}-{{function}}
  • ✅ Changelog and repartition topics are explicitly named
  • ✅ All topics can be pre-created and managed via IaC
  • ✅ CI/CD ready with Docker Compose and comprehensive tests

Architecture

Input Topic                          Kafka Streams Application                   Output Topic
┌─────────────────────────────┐     ┌──────────────────────────────┐      ┌──────────────────────────────┐
│cus-s-pub-windowed-agg-input │────▶│  WindowedAggregation         │─────▶│cus-s-pub-windowed-agg-output │
│                             │     │                              │      │                              │
│ Events (JSON)               │     │  • Group by key              │      │ Aggregated (JSON)            │
│ - user-1, click, 10         │     │  • 5-min tumbling window     │      │ - count, sum, avg            │
│ - user-1, view, 20          │     │  • Aggregate statistics      │      │ - window timestamps          │
└─────────────────────────────┘     └──────────────────────────────┘      └──────────────────────────────┘
                                              │               │
                                              ▼               ▼
                                    ┌──────────────────┐  ┌────────────────────────┐
                                    │   Repartition    │  │      Changelog         │
                                    │      Topic       │  │        Topic           │
                                    └──────────────────┘  └────────────────────────┘
                                    cus-s-pub-windowed-   cus-s-pub-windowed-agg-
                                    agg-events-by-key-    event-count-store-
                                    repartition           changelog

Quick Start

Prerequisites

  • Java 17+
  • Maven 3.6+
  • Docker and Docker Compose

1. Clone and Build

# Clone the repository
git clone <repo-url>
cd kafka-streams-using-topic-naming

# Build the project
mvn clean package

2. Start Kafka

# Start Kafka, Schema Registry, and Kafka UI
docker-compose up -d

# Verify topics were created
docker exec windowed-agg-broker kafka-topics --list --bootstrap-server localhost:9092

Expected topics:

  • cus-s-pub-windowed-agg-input
  • cus-s-pub-windowed-agg-output
  • cus-s-pub-windowed-agg-event-count-store-changelog
  • cus-s-pub-windowed-agg-events-by-key-repartition

3. Run the Application

# Run via Maven
mvn exec:java -Dexec.mainClass="com.github.osodevops.kafka.StreamsApplication"

# Or run the JAR
java -jar target/kafka-streams-using-topic-naming-1.0.0-SNAPSHOT-jar-with-dependencies.jar

4. Produce Sample Data

# Make script executable
chmod +x scripts/produce-sample-data.sh

# Generate and produce 50 sample events
./scripts/produce-sample-data.sh localhost:9092 50

5. View Results

Option 1: Kafka Console Consumer

kafka-console-consumer \
  --bootstrap-server localhost:9092 \
  --topic cus-s-pub-windowed-agg-output \
  --from-beginning \
  --property print.key=true \
  --property key.separator=': '

Option 2: Kafka UI

Open http://localhost:8080 and navigate to Topics → cus-s-pub-windowed-agg-output

6. Cleanup

# Stop application (Ctrl+C)

# Stop Docker services
docker-compose down

# Remove all data
docker-compose down -v

Project Structure

kafka-streams-using-topic-naming/
├── pom.xml                           # Maven configuration
├── docker-compose.yml                # Local Kafka environment
├── README.md                         # This file
│
├── src/main/java/com/github/osodevops/kafka/
│   ├── StreamsApplication.java       # Main application
│   ├── config/
│   │   └── TopicConfig.java          # Centralized topic naming
│   ├── model/
│   │   ├── Event.java                # Input event model
│   │   └── AggregatedEvent.java      # Output aggregation model
│   ├── serde/
│   │   └── JsonSerdes.java           # JSON serializers/deserializers
│   └── topology/
│       └── WindowedAggregationTopology.java  # Streams topology
│
├── src/main/resources/
│   ├── application.properties        # Application configuration
│   └── log4j2.xml                    # Logging configuration
│
├── src/test/java/com/github/osodevops/kafka/
│   ├── topology/
│   │   └── WindowedAggregationTopologyTest.java  # Unit tests
│   └── integration/
│       └── StreamsIntegrationTest.java           # Integration tests
│
├── scripts/
│   ├── create-topics.sh              # Manual topic creation
│   └── produce-sample-data.sh        # Sample data generator
│
└── doc/
    ├── kafka-streams-topic-naming-prd.md  # Product requirements
    ├── topic-naming-guide.md              # Topic naming conventions
    ├── topic-retention-and-deletion.md    # Retention and deletion policies
    └── deployment-guide.md                # Deployment instructions

Topic Naming Convention

All topics follow the organizational pattern: {{domain_id}}-{{environment}}-{{accessibility}}-{{service}}-{{function}}

Example Configuration (in TopicConfig.java):

  • Domain: cus (Customer)
  • Environment: s (Staging)
  • Accessibility: pub (Public)
  • Service: windowed-agg

Application Topics

Topic Name Purpose Cleanup Policy Retention
cus-s-pub-windowed-agg-input Consumes raw events delete 7 days
cus-s-pub-windowed-agg-output Publishes aggregated results delete 7 days

Internal Topics

Topic Name Purpose Cleanup Policy Retention
cus-s-pub-windowed-agg-event-count-store-changelog State store changelog compact,delete 7 days
cus-s-pub-windowed-agg-events-by-key-repartition Data repartitioning delete 1 hour

See Topic Retention and Deletion Guide for detailed configuration.

Code Implementation

Changelog Topic:

// TopicConfig.java sets APPLICATION_ID = "cus-s-pub-windowed-agg"
Materialized.<String, AggregationState, WindowStore>as("event-count-store")
// Results in: cus-s-pub-windowed-agg-event-count-store-changelog

Repartition Topic:

// TopicConfig.java sets APPLICATION_ID = "cus-s-pub-windowed-agg"
Grouped.<String, Event>as("events-by-key")
// Results in: cus-s-pub-windowed-agg-events-by-key-repartition

See Topic Naming Guide for complete details.

Configuration

Key Properties

Application Configuration (application.properties):

# Application identifier (used as topic prefix)
application.id=windowed-agg

# Kafka brokers
bootstrap.servers=localhost:9092

# CRITICAL: Disable auto topic creation
auto.create.topics.enable=false

# Processing guarantee
processing.guarantee=exactly_once_v2

# State directory
state.dir=/tmp/kafka-streams

Environment-Specific Configuration

Override via system properties:

java -jar app.jar \
  -Dkafka.bootstrap.servers=prod-kafka:9092 \
  -Dkafka.replication.factor=3

Testing

Unit Tests

Tests use TopologyTestDriver for fast, isolated testing:

# Run unit tests
mvn test -Dtest=WindowedAggregationTopologyTest

Test Coverage:

  • Single event aggregation
  • Multiple events in same window
  • Events split across windows
  • Different keys produce independent aggregations
  • Window timestamp correctness
  • Topic naming verification

Integration Tests

Tests use Testcontainers with real Kafka:

# Run integration tests
mvn test -Dtest=StreamsIntegrationTest

Test Coverage:

  • End-to-end event processing
  • Internal topic creation verification
  • Topic naming convention compliance
  • Multi-instance behavior

Run All Tests

# All tests
mvn test

# With coverage report
mvn test jacoco:report

Key Implementation Details

Windowed Aggregation

The application performs time-based aggregation:

  1. Input: Events with key, type, value, and timestamp
  2. Grouping: Group events by key (creates repartition topic)
  3. Windowing: 5-minute tumbling windows
  4. Aggregation: Count, sum, and average per window
  5. Output: Aggregated statistics with window boundaries

State Management

  • Store Name: event-count-store
  • Store Type: Windowed key-value store
  • Changelog Topic: cus-s-pub-windowed-agg-event-count-store-changelog
  • Retention: Based on window size and grace period
  • Compaction: Enabled for changelog topic

Serialization

Custom JSON serialization using Jackson:

  • Input: Event → JSON
  • Output: AggregatedEvent → JSON
  • State: Internal aggregation state → JSON
  • Timestamp: ISO 8601 format

Monitoring

Kafka UI

Access at http://localhost:8080 to view:

  • Topic messages and metadata
  • Consumer group lag
  • Broker metrics
  • Schema registry (if using Avro)

JMX Metrics

Key metrics to monitor:

# Stream state
kafka.streams:type=stream-state-metrics,state-id=*

# Thread performance
kafka.streams:type=stream-thread-metrics,thread-id=*

# Task metrics
kafka.streams:type=stream-task-metrics,task-id=*

Logs

Application logs location:

  • Console: Standard output
  • File: logs/kafka-streams-app.log (configurable in log4j2.xml)

Deployment

See Deployment Guide for:

  • Production deployment steps
  • Topic creation scripts
  • Configuration examples
  • Scaling strategies
  • Troubleshooting guide

Production Checklist

  • All topics manually created with appropriate replication
  • Auto topic creation disabled on brokers
  • State directory on persistent storage
  • Monitoring and alerting configured
  • ACLs configured (if security enabled)
  • Backup and recovery procedures in place

Troubleshooting

Common Issues

Application won't start: "Topic does not exist"

Ensure all topics are created via docker-compose or manually:

docker-compose up -d
docker-compose logs kafka-setup

Wrong topic names created

Verify application.id matches TopicConfig.APPLICATION_ID:

// In TopicConfig.java
public static final String APPLICATION_ID = "windowed-agg";

No output produced

Check:

  1. Application is running and consuming
  2. Events are being produced to input topic
  3. Window time has advanced (wait 5+ minutes)

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

Documentation

Technology Stack

  • Java: 17
  • Kafka: 4.1.0
  • Kafka Streams: 4.1.0
  • Jackson: 2.18.2
  • JUnit: 5.11.4
  • Testcontainers: 1.20.4
  • Maven: 3.x
  • Docker Compose: 3.8

License

This project is provided as an example implementation for educational purposes.

References

Support

For issues and questions:

  1. Check the Troubleshooting section
  2. Review the documentation
  3. Check Kafka Streams logs for error details
  4. Open an issue with reproduction steps

Built with explicit topic naming for operational excellence

About

Basic Kafka Streams example which uses a topic naming convention

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •