Replies: 2 comments
-
Connector Library Placement: Where Should Common Code Live?Author: Chiradip The QuestionWhile working on But then I thought: What if the common connector code ( This isn't just a file organization question - it's a philosophical one about what "SDK" means for Iggy. Why I'm Even Considering This1. These Aren't Really "Flink Patterns" - They're "Iggy Patterns"When I look at what's in the common code:
All of these are Iggy integration patterns, not stream processor patterns. They're reusable precisely because they encode knowledge about Iggy's semantics, not Flink's or Spark's. 2. The Kafka PrecedentI looked at how Kafka does this: Kafka includes serialization, configuration patterns, and common utilities in the core client library. Flink and Spark connectors then depend on This suggests there's precedent for "integration patterns" living in the SDK. 3. User Experience SimplificationCurrent approach (separate library): dependencies {
implementation("org.apache.iggy:iggy:0.5.0") // SDK
implementation("org.apache.iggy:iggy-connector-library:0.5.0") // Common connector code
compileOnly("org.apache.flink:flink-streaming-java:1.18.0") // Flink
}If in SDK: dependencies {
implementation("org.apache.iggy:iggy:0.5.0") // SDK + common patterns
compileOnly("org.apache.flink:flink-streaming-java:1.18.0") // Flink
}One less dependency to manage. One less version to coordinate. Why I'm Hesitant1. SDK BloatThe Iggy SDK is currently lean and focused: it's a client for talking to Iggy servers. Adding ~1500 LOC of connector abstractions changes its character. Not everyone who uses the SDK needs:
A microservice that just publishes events to Iggy doesn't need this extra weight. 2. Dependency ContaminationThe connector abstractions would need dependencies: dependencies {
// Core Iggy SDK needs these
implementation("org.apache.httpcomponents.client5:httpclient5:5.4.3")
implementation("com.fasterxml.jackson.core:jackson-databind:2.18.0")
// NEW: Connector patterns would add these
implementation("org.apache.avro:avro:1.11.3") // For Avro serialization
implementation("io.micrometer:micrometer-core:1.12.0") // For metrics collection
// ... more as serialization formats grow
}Should every SDK user pull in Avro dependencies even if they never serialize to Avro? Counter-argument: We could use 3. Release CouplingIf the connector patterns need a breaking change (e.g., we discover offset tracking needs a different interface), it forces the entire SDK to bump major version. Example scenario:
With separate artifacts:
4. Conceptual BoundaryI think there's value in maintaining a clear line: SDK = "How to communicate with Iggy"
Connector Library = "How to integrate Iggy with data processing frameworks"
Different audiences:
Mixing these might confuse the story. 5. Portability ImplicationsIf we eventually contribute the Flink connector to Apache Flink (or it moves to a separate repo), having it depend on a lightweight Flink connector depending on Iggy SDK:
Flink connector depending on connector-common:
The Middle Ground I'm ConsideringOption 1: SDK with Modular StructureKeep it in SDK but make it clearly optional: Publish single artifact but document that Pros:
Cons:
Option 2: Separate Module, Co-Located in java-sdk ProjectPublished artifacts:
Pros:
Cons:
Option 3: Stay in iggy-connector-flink, Extract When Spark ArrivesThe current plan in
Pros:
Cons:
My Current ThinkingDecision FrameworkI'm leaning towards making this decision based on pattern maturity: Phase 1: Start in iggy-connector-flink (Now)Keep the current Why:
Design discipline:
Phase 2: Extract to java-connector-common (When Spark Arrives)After implementing Spark connector and validating >70% reuse: Why:
Migration:
Phase 3: Consider SDK Promotion (After 1+ Year Stability)After Evaluate:
If YES → Merge into SDK: If NO → Keep separate: Kafka vs Iggy: Key DifferenceOne thing I realized: Kafka's client already includes SerDes because that's fundamental to Kafka's usage model. Every Kafka producer/consumer configures serializers: Properties props = new Properties();
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.JsonSerializer");
KafkaProducer<String, User> producer = new KafkaProducer<>(props);Serialization is core to Kafka's API. So Iggy is different: The SDK sends/receives byte arrays. Serialization is application-level concern: IggyClient client = IggyClient.create(config);
byte[] messageBytes = objectMapper.writeValueAsBytes(user);
client.sendMessage(streamId, topicId, messageBytes);So for Iggy, serialization schemas aren't SDK-level - they're integration-level. This supports keeping them separate. My Decision (For Now)Stick with option_x_1.md StructureI'm going to:
Rationale
Design Principles to FollowTo make future extraction easy:
When to RevisitRevisit this decision when:
Questions I'm Still Pondering1. Should Serialization Live in SDK?Even if offset tracking and partition discovery stay in connectors, should serialization schemas be SDK-level? Argument for SDK:
Argument against:
Leaning: Keep in connector library for now, but this is a strong candidate for eventual SDK inclusion. 2. What About Client Pooling?Connection pooling in Many non-connector users would benefit from this. Maybe this should be in SDK sooner? 3. Naming: "Connector" vs "Integration"?
But "connector" is more specific and matches industry terminology (Kafka uses "connector"). Decision: Stick with "connector" for now. ConclusionI'm documenting this deliberation because I think it reflects an important architectural tension: SDK minimalism vs. integration convenience. Kafka chose convenience (include SerDes in client). That works for them because their usage model demands it. Iggy's SDK is cleaner without these concerns. But as we build the integration ecosystem (Flink, Spark, Beam), we need shared patterns. The three-phase evolution (experimental → extracted → promoted) gives us time to learn what belongs where without premature commitment. For option_x_1.md: The current design stands. No changes needed. This deliberation informs future decisions, not present ones. Status: Decision made - stick with current structure, document evolution path |
Beta Was this translation helpful? Give feedback.
-
|
That's great progress, I like it so far. As of your questions
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Iggy Connector Architecture Involving Data/Stream Processors like Flink
Option X: Flink-Centric Approach
Option X.1: Iggy-Housed Flink Connector
Option Y: Bidirectional Integration (Depends on X)
Option Z: Iggy-Centric Hub Architecture
Current Iggy Connectors handle:
External System → Iggy (Source)
Iggy → External System (Sink)
Missing Category - Stream Processors:
Iggy → Processor (Flink/Spark) → Iggy
(transform/aggregate/join)
Recommended Approach
I propose to start with Option X.1 for these reasons:
Proposed Structure for X.1:
This structure:
The Bigger Vision (Option Z)
Option Z suggests treating Iggy as a stream processing hub, where:
This makes Iggy not just a message broker, but a stream processing orchestrator.
Beta Was this translation helpful? Give feedback.
All reactions