[Feature] Add Kafka Integration for StarRocks Audit Loader Plugin #28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds Apache Kafka integration to the StarRocks Audit Loader plugin, enabling real-time streaming of audit logs to Kafka topics alongside or instead of the existing Stream Load mechanism. The implementation provides a flexible, pluggable architecture with multiple routing modes to support various deployment scenarios.
Motivation
Problem Statement
The current audit loader plugin only supports Stream Load as the output mechanism, which has several limitations:
Use Cases
Solution
This PR introduces a pluggable output architecture with Kafka integration, providing:
Key Features
✨ 4 Output Routing Modes:
streamload- Traditional Stream Load only (default, backward compatible)kafka- Kafka only for pure streaming use casesdual- Both Stream Load and Kafka simultaneously for migration/redundancyfallback- Primary with fallback to secondary on failure for high availability✨ Production-Ready Kafka Producer:
✨ JSON Serialization:
✨ Flexible Configuration:
Architecture
Changes Made
New Files Created
Core Interfaces & Routing (186 lines)
src/main/java/com/starrocks/plugin/audit/output/OutputHandler.javainit(),send(),close(),getName(),isHealthy()src/main/java/com/starrocks/plugin/audit/routing/OutputRouter.javaKafka Integration (841 lines)
src/main/java/com/starrocks/plugin/audit/kafka/KafkaConfig.java(179 lines)src/main/java/com/starrocks/plugin/audit/kafka/KafkaProducerManager.java(228 lines)src/main/java/com/starrocks/plugin/audit/kafka/AuditEventSerializer.java(191 lines)src/main/java/com/starrocks/plugin/audit/kafka/KafkaMetrics.java(143 lines)src/main/java/com/starrocks/plugin/audit/output/KafkaOutputHandler.java(100 lines)Stream Load Wrapper (222 lines)
src/main/java/com/starrocks/plugin/audit/output/StreamLoadOutputHandler.javaModified Files
Core Plugin (245 lines modified)
src/main/java/com/starrocks/plugin/audit/AuditLoaderPlugin.javainitializeOutputRouter()method for handler setuploadIfNecessary()to route eventsclose()to properly cleanup OutputRouterLoadWorkerto work with new architectureDependencies & Configuration
pom.xmlkafka-clients:3.6.0dependencysrc/main/assembly/plugin.confoutput_mode=streamloadplugin.conf.exampleStatistics
Configuration
Basic Kafka Configuration
Advanced Configurations
Dual Mode (Both Stream Load + Kafka)
Fallback Mode (High Availability)
Security Configuration
Message Format
Audit events are serialized as JSON:
{ "queryId": "d4c7a0d5-4e8f-11ed-bdc3-0242ac120002", "timestamp": "2024-11-05 16:30:45", "queryType": "query", "clientIp": "192.168.1.100", "user": "admin", "authorizedUser": "admin", "resourceGroup": "default", "catalog": "default_catalog", "db": "test_db", "state": "EOF", "errorCode": "", "queryTime": 125, "scanBytes": 1048576, "scanRows": 10000, "returnRows": 100, "cpuCostNs": 50000000, "memCostBytes": 2097152, "stmtId": 1, "isQuery": 1, "feIp": "172.16.0.10", "stmt": "SELECT * FROM users LIMIT 100", "digest": "3a7bd3e2c1f8d5e6b4a2d9c8e7f6a5b4", "planCpuCosts": 10000000.0, "planMemCosts": 524288.0, "pendingTimeMs": 5, "candidateMVs": "", "hitMvs": "", "warehouse": "default_warehouse" }Testing
Build & Compilation
Installation Testing
Functional Testing
1. Verify Plugin Initialization
2. Verify Message Delivery
3. Performance Testing
acks=1Compatibility Testing
output_mode=streamloadmaintains existing behaviorBreaking Changes
None! 🎉
This is a fully backward-compatible feature addition:
output_mode=streamload)Migration Guide
For New Deployments
Simply configure Kafka settings in
plugin.confbefore installation:For Existing Deployments
Option 1: Gradual Migration (Recommended)
Phase 1: Dual Mode (1-2 weeks)
Phase 2: Validation
Phase 3: Switch to Kafka Only
Option 2: Direct Switch
For non-critical environments:
Reinstall plugin:
Performance Characteristics
Benchmarks
Optimization Tips
High Throughput:
Low Latency:
High Reliability:
Monitoring & Observability
Metrics Available
Logged in FE logs:
Health Checks
The plugin monitors producer health:
Troubleshooting
Common issues and solutions documented in code comments:
Documentation
Included in PR
plugin.conf.example- Comprehensive configuration examplesplugin.confFuture Work
Dependencies
New Dependencies
org.apache.kafka:kafka-clients:3.6.0Dependency Rationale
Backward Compatibility
API Compatibility
Configuration Compatibility
Runtime Compatibility
Security Considerations
Implemented Security Features