Iggy Connector Architecture Involving Data/Stream Processors like Flink #2236

chiradip · 2025-10-03T19:15:32Z

chiradip
Oct 3, 2025

Iggy Connector Architecture Involving Data/Stream Processors like Flink

Option X: Flink-Centric Approach

Develop IggySource/IggySink as a separate Flink connector project (like flink-connector-kafka)
This would live in the Apache Flink ecosystem
Challenge: Requires Flink community acceptance, separate release cycles

Option X.1: Iggy-Housed Flink Connector

Put the Flink connector inside Iggy's source tree
New proposal: iggy/core/connectors/external_processors/
Challenge: How to structure this within Iggy's current architecture

Option Y: Bidirectional Integration (Depends on X)

Once IggySource/IggySink exist, Flink becomes both a source AND sink for Iggy
This creates a bidirectional flow: Iggy → Flink → Iggy
Enables Flink as a stream processor in Iggy pipelines

Option Z: Iggy-Centric Hub Architecture

Paradigm shift: Put "Iggy Hub" at the center
Treat Flink/Spark as pluggable processors, not just sources/sinks
Current Iggy connectors handle:
- Producers/Consumers (Kafka, databases)
- Queryable systems (APIs, databases)
Missing piece: Stream processors that transform, multiplex, demultiplex

Current Iggy Connectors handle:
External System → Iggy (Source)
Iggy → External System (Sink)

Missing Category - Stream Processors:
Iggy → Processor (Flink/Spark) → Iggy
(transform/aggregate/join)

Recommended Approach

I propose to start with Option X.1 for these reasons:

Immediate control - No external dependencies
Faster iteration - Direct integration with Iggy releases
Sets foundation for Option Z later

Proposed Structure for X.1:

  iggy/
  ├── core/
  │   ├── connectors/
  │   │   ├── sources/        # Current sources
  │   │   ├── sinks/          # Current sinks
  │   │   └── processors/     # NEW: Stream processors
  │   │       ├── flink/
  │   │       │   ├── src/
  │   │       │   │   ├── IggyFlinkSource.java
  │   │       │   │   ├── IggyFlinkSink.java
  │   │       │   │   └── IggyFlinkProcessor.java
  │   │       │   ├── Cargo.toml
  │   │       │   └── README.md
  │   │       └── spark/      # Future

This structure:

Acknowledges processors as a third connector type
Keeps Flink code within Iggy's control
Enables future Option Z implementation
Allows processors to be bidirectional (both source and sink)

The Bigger Vision (Option Z)

Option Z suggests treating Iggy as a stream processing hub, where:

                  ┌──────────────┐
                  │   Iggy Hub   │
                  └──────┬───────┘
                         │
      ┌──────────────────┼──────────────────┐
      │                  │                  │
 ┌────▼────┐      ┌──────▼──────┐    ┌─────▼────┐
 │  Flink  │      │    Spark    │    │  Custom  │
 │Processor│      │  Processor  │    │Processor │
 └─────────┘      └─────────────┘    └──────────┘

This makes Iggy not just a message broker, but a stream processing orchestrator.

chiradip · 2025-10-16T18:37:17Z

chiradip
Oct 16, 2025
Author

Connector Library Placement: Where Should Common Code Live?

Author: Chiradip
Date: 2025-10-16
Context: While architecting the Flink connector for Iggy, I realized we'd have substantial reusable code (~88%) for future processors like Spark. This led to a fundamental question about where this common connector code should live.

The Question

While working on option_x_1.md, I initially proposed:

external-processors/
└── iggy-connector-flink/
    └── iggy-connector-library/          # Contains common + Flink code
        └── org/apache/iggy/connector/
            ├── config/                  # Framework-agnostic
            ├── serialization/           # Framework-agnostic
            ├── offset/                  # Framework-agnostic
            └── flink/                   # Flink-specific

But then I thought: What if the common connector code (org.apache.iggy.connector.*) actually belongs in the Iggy java-sdk itself?

This isn't just a file organization question - it's a philosophical one about what "SDK" means for Iggy.

Why I'm Even Considering This

1. These Aren't Really "Flink Patterns" - They're "Iggy Patterns"

When I look at what's in the common code:

Offset tracking - This is about Iggy's consumer offset model, not Flink's
Partition discovery - This is about discovering Iggy partitions
Connection pooling - Managing connections to Iggy servers
Serialization schemas - Converting Iggy message bytes to/from objects

All of these are Iggy integration patterns, not stream processor patterns. They're reusable precisely because they encode knowledge about Iggy's semantics, not Flink's or Spark's.

2. The Kafka Precedent

I looked at how Kafka does this:

kafka-clients/
├── org/apache/kafka/clients/
│   ├── consumer/                    # Consumer client
│   ├── producer/                    # Producer client
│   └── admin/                       # Admin operations
└── org/apache/kafka/common/
    ├── serialization/               # SerDes (Serializer/Deserializer)
    ├── config/                      # Configuration abstractions
    └── errors/                      # Common exceptions

kafka-streams/                       # Higher-level stream processing API
flink-connector-kafka/               # Lives in Flink ecosystem, depends on kafka-clients
spark-connector-kafka/               # Lives in Spark ecosystem, depends on kafka-clients

Kafka includes serialization, configuration patterns, and common utilities in the core client library. Flink and Spark connectors then depend on kafka-clients and wrap it in their respective APIs.

This suggests there's precedent for "integration patterns" living in the SDK.

3. User Experience Simplification

Current approach (separate library):

dependencies {
    implementation("org.apache.iggy:iggy:0.5.0")                    // SDK
    implementation("org.apache.iggy:iggy-connector-library:0.5.0")  // Common connector code
    compileOnly("org.apache.flink:flink-streaming-java:1.18.0")     // Flink
}

If in SDK:

dependencies {
    implementation("org.apache.iggy:iggy:0.5.0")                    // SDK + common patterns
    compileOnly("org.apache.flink:flink-streaming-java:1.18.0")     // Flink
}

One less dependency to manage. One less version to coordinate.

Why I'm Hesitant

1. SDK Bloat

The Iggy SDK is currently lean and focused: it's a client for talking to Iggy servers. Adding ~1500 LOC of connector abstractions changes its character.

Not everyone who uses the SDK needs:

Offset tracking strategies
Partition discovery algorithms
Sophisticated retry policies
Multiple serialization formats (JSON, Avro, etc.)

A microservice that just publishes events to Iggy doesn't need this extra weight.

2. Dependency Contamination

The connector abstractions would need dependencies:

dependencies {
    // Core Iggy SDK needs these
    implementation("org.apache.httpcomponents.client5:httpclient5:5.4.3")
    implementation("com.fasterxml.jackson.core:jackson-databind:2.18.0")

    // NEW: Connector patterns would add these
    implementation("org.apache.avro:avro:1.11.3")              // For Avro serialization
    implementation("io.micrometer:micrometer-core:1.12.0")    // For metrics collection
    // ... more as serialization formats grow
}

Should every SDK user pull in Avro dependencies even if they never serialize to Avro?

Counter-argument: We could use compileOnly or make them optional dependencies, but that adds complexity.

3. Release Coupling

If the connector patterns need a breaking change (e.g., we discover offset tracking needs a different interface), it forces the entire SDK to bump major version.

Example scenario:

SDK is stable at v0.6.0
We discover offset tracking needs API changes for Spark integration
Must release SDK v0.7.0 even though client code didn't change
All users must upgrade, even those not using connectors

With separate artifacts:

SDK stays at v0.6.0
iggy-connector-common bumps to v0.4.0
Only connector users are affected

4. Conceptual Boundary

I think there's value in maintaining a clear line:

SDK = "How to communicate with Iggy"

Connect to server
Send messages
Receive messages
Manage streams/topics
Handle consumer groups

Connector Library = "How to integrate Iggy with data processing frameworks"

Offset management strategies
Partition assignment logic
Serialization format handling
Retry and error handling patterns

Different audiences:

SDK users: Application developers building services
Connector users: Data engineers building pipelines

Mixing these might confuse the story.

5. Portability Implications

If we eventually contribute the Flink connector to Apache Flink (or it moves to a separate repo), having it depend on a lightweight iggy-connector-common is cleaner than depending on the full Iggy SDK.

Flink connector depending on Iggy SDK:

Pulls in HTTP client, full message models, admin APIs
Heavier dependency footprint

Flink connector depending on connector-common:

Just the abstractions needed for integration
Lighter, more focused

The Middle Ground I'm Considering

Option 1: SDK with Modular Structure

Keep it in SDK but make it clearly optional:

java-sdk/
└── src/
    └── main/java/org/apache/iggy/
        ├── client/              # Core client (required)
        ├── messages/            # Message models (required)
        ├── streams/             # Stream operations (required)
        └── connector/           # Integration patterns (optional)
            ├── config/
            ├── serialization/
            ├── offset/
            └── partition/

Publish single artifact but document that connector.* packages are optional utilities for integration scenarios.

Pros:

One artifact, simpler for users
Natural discoverability
Connector patterns get SDK-level stability guarantees

Cons:

Still has bloat/dependency issues
Can't version independently
Mixed concerns in one artifact

Option 2: Separate Module, Co-Located in java-sdk Project

foreign/java/
├── java-sdk/                           # Core client
│   └── build.gradle.kts
├── java-connector-common/              # NEW: Connector abstractions
│   └── build.gradle.kts
│       dependencies {
│           api(project(":java-sdk"))   # Depends on SDK
│       }
└── external-processors/
    └── iggy-connector-flink/
        └── build.gradle.kts
            dependencies {
                api("org.apache.iggy:iggy-connector-common:0.5.0")
            }

Published artifacts:

org.apache.iggy:iggy - Core SDK
org.apache.iggy:iggy-connector-common - Connector abstractions
Flink/Spark connectors depend on iggy-connector-common

Pros:

Clean separation of concerns
Independent versioning
SDK stays lean
Connector patterns live at Iggy level (not scattered in processor projects)
Easy to find: still in foreign/java/

Cons:

Two artifacts instead of one
Users need both dependencies
More complex dependency graph

Option 3: Stay in iggy-connector-flink, Extract When Spark Arrives

The current plan in option_x_1.md:

Start with iggy-connector-library inside iggy-connector-flink/
Contains both common and Flink code
When Spark is added, evaluate and extract common to iggy-connector-common/ if reuse >70%

Pros:

Pragmatic: build first, extract later
Allows patterns to stabilize before promoting
YAGNI principle: don't create abstractions until second use case exists
Flink-first approach keeps things simple initially

Cons:

Requires refactoring when Spark arrives
Flink connector temporarily "owns" common code
Users might start depending on internal packages

My Current Thinking

Decision Framework

I'm leaning towards making this decision based on pattern maturity:

┌─────────────────────────────────────────────────────────────┐
│  Pattern Lifecycle                                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Experimental (in processor-specific package)            │
│     └─> Patterns are unproven, rapidly changing            │
│                                                             │
│  2. Stable (extract to java-connector-common)               │
│     └─> Battle-tested by 2+ processors, API stable         │
│                                                             │
│  3. Foundational (promote to java-sdk)                      │
│     └─> Universal patterns, used by majority of users      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Phase 1: Start in iggy-connector-flink (Now)

Keep the current option_x_1.md structure:

external-processors/
└── iggy-connector-flink/
    └── iggy-connector-library/
        └── org/apache/iggy/connector/
            ├── config/                  # Common abstractions
            ├── serialization/           # Common abstractions
            ├── offset/                  # Common abstractions
            └── flink/                   # Flink-specific

Why:

Patterns are still experimental - I haven't actually built them yet!
Allows rapid iteration without breaking SDK compatibility
No commitment until we see real-world usage
YAGNI: don't extract until proven valuable by second consumer (Spark)

Design discipline:

Keep org.apache.iggy.connector.* (non-flink) packages framework-agnostic
No Flink imports in common code
Design as if they'll be extracted later

Phase 2: Extract to java-connector-common (When Spark Arrives)

After implementing Spark connector and validating >70% reuse:

foreign/java/
├── java-sdk/                           # Lean core
├── java-connector-common/              # NEW: Proven patterns
│   └── org/apache/iggy/connector/
│       ├── config/
│       ├── serialization/
│       └── offset/
└── external-processors/
    ├── iggy-connector-flink/
    │   └── iggy-flink-connector/       # Flink-specific only
    └── iggy-connector-spark/
        └── iggy-spark-connector/       # Spark-specific only

Why:

Patterns proven stable by 2 implementations
Clear reuse case justifies extraction
Clean separation achieved
Independent versioning established

Migration:

Create java-connector-common/ module
Move org.apache.iggy.connector.* packages from Flink library
Both Flink and Spark depend on connector-common
Publish org.apache.iggy:iggy-connector-common

Phase 3: Consider SDK Promotion (After 1+ Year Stability)

After connector-common has been stable for 1+ year and multiple processors (Flink, Spark, maybe Beam):

Evaluate:

Are these patterns used by >50% of Iggy users (not just processor integrations)?
Have APIs been stable (no breaking changes in a year)?
Are dependencies minimal (no heavy libs like Avro required in common)?
Do non-processor use cases benefit (e.g., custom consumers using offset tracking)?

If YES → Merge into SDK:

java-sdk/
└── org/apache/iggy/
    ├── client/              # Core client
    ├── connector/           # Promoted patterns
    └── ...

If NO → Keep separate:
Connector patterns remain at integration level, not SDK level.

Kafka vs Iggy: Key Difference

One thing I realized: Kafka's client already includes SerDes because that's fundamental to Kafka's usage model.

Every Kafka producer/consumer configures serializers:

Properties props = new Properties();
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.JsonSerializer");
KafkaProducer<String, User> producer = new KafkaProducer<>(props);

Serialization is core to Kafka's API. So kafka-clients includes SerDes.

Iggy is different: The SDK sends/receives byte arrays. Serialization is application-level concern:

IggyClient client = IggyClient.create(config);
byte[] messageBytes = objectMapper.writeValueAsBytes(user);
client.sendMessage(streamId, topicId, messageBytes);

So for Iggy, serialization schemas aren't SDK-level - they're integration-level. This supports keeping them separate.

My Decision (For Now)

Stick with option_x_1.md Structure

I'm going to:

Keep iggy-connector-library in iggy-connector-flink/ as designed in option_x_1.md
Use clear package separation: connector.* (common) vs connector.flink.* (Flink-specific)
Design for extraction: Write common code as if it will be pulled out later
Document the evolution path: Clearly state in README that common code may be extracted when Spark is added

Rationale

Pragmatic: Build and prove the patterns first
Flexible: Can extract to connector-common or even SDK later based on real data
Low risk: If extraction never happens (e.g., Spark reuse is <40%), we haven't pre-optimized
Reversible: Clear package boundaries make future refactoring straightforward

Design Principles to Follow

To make future extraction easy:

✅ No Flink imports in org.apache.iggy.connector.* (only in connector.flink.*)
✅ Minimal dependencies in common code (avoid heavy libs)
✅ Interface-based design (easy to swap implementations)
✅ Document extraction intent in package-info.java
✅ Treat as "internal" initially (no stability guarantees to external users)

When to Revisit

Revisit this decision when:

Spark connector is being implemented (evaluate actual reuse %)
Common patterns have been stable for 6+ months
External users ask to use connector patterns standalone
We discover non-processor use cases for these utilities

Questions I'm Still Pondering

1. Should Serialization Live in SDK?

Even if offset tracking and partition discovery stay in connectors, should serialization schemas be SDK-level?

Argument for SDK:

Every Iggy user needs to serialize/deserialize
Basic JSON/Avro helpers would benefit all users
Natural place for IggyMessage<T> type-safe wrappers

Argument against:

SDK is currently byte-array focused (clean, simple)
Serialization format is application choice
Adding Jackson/Avro deps to SDK is heavy

Leaning: Keep in connector library for now, but this is a strong candidate for eventual SDK inclusion.

2. What About Client Pooling?

Connection pooling in org.apache.iggy.connector.util.IggyClientPool - is this connector-level or SDK-level?

Many non-connector users would benefit from this. Maybe this should be in SDK sooner?

3. Naming: "Connector" vs "Integration"?

org.apache.iggy.connector.* implies these are connector-specific patterns.
org.apache.iggy.integration.* might better signal "reusable integration utilities."

But "connector" is more specific and matches industry terminology (Kafka uses "connector").

Decision: Stick with "connector" for now.

Conclusion

I'm documenting this deliberation because I think it reflects an important architectural tension: SDK minimalism vs. integration convenience.

Kafka chose convenience (include SerDes in client). That works for them because their usage model demands it.

Iggy's SDK is cleaner without these concerns. But as we build the integration ecosystem (Flink, Spark, Beam), we need shared patterns.

The three-phase evolution (experimental → extracted → promoted) gives us time to learn what belongs where without premature commitment.

For option_x_1.md: The current design stands. No changes needed. This deliberation informs future decisions, not present ones.

Status: Decision made - stick with current structure, document evolution path
Next review: When Spark connector implementation begins
Owner: Chiradip
Stakeholders: Iggy maintainers, future Flink/Spark connector contributors

0 replies

numinnex · 2025-10-16T18:59:47Z

numinnex
Oct 16, 2025
Collaborator

That's great progress, I like it so far.

As of your questions

I think we shouldn't put any serialization/deserialization code in the SDK right now, in the future when we'll have an high-level SDK matching the one from Rust, we can think about having some SDK side infrastructure for integration with schema registries such as Avro.
I would say that in the same vein as answer for question 1, this could be a feature of the high level SDK, but afaik Rust version does not have that, it's on the user of the SDK to create a connection pool themselves, if they wish for.
Connector as name is the way to go in my opinion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iggy Connector Architecture Involving Data/Stream Processors like Flink #2236

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Iggy Connector Architecture Involving Data/Stream Processors like Flink #2236

Uh oh!

Uh oh!

chiradip Oct 3, 2025

Replies: 2 comments

Uh oh!

chiradip Oct 16, 2025 Author

Connector Library Placement: Where Should Common Code Live?

The Question

Why I'm Even Considering This

1. These Aren't Really "Flink Patterns" - They're "Iggy Patterns"

2. The Kafka Precedent

3. User Experience Simplification

Why I'm Hesitant

1. SDK Bloat

2. Dependency Contamination

3. Release Coupling

4. Conceptual Boundary

5. Portability Implications

The Middle Ground I'm Considering

Option 1: SDK with Modular Structure

Option 2: Separate Module, Co-Located in java-sdk Project

Option 3: Stay in iggy-connector-flink, Extract When Spark Arrives

My Current Thinking

Decision Framework

Phase 1: Start in iggy-connector-flink (Now)

Phase 2: Extract to java-connector-common (When Spark Arrives)

Phase 3: Consider SDK Promotion (After 1+ Year Stability)

Kafka vs Iggy: Key Difference

My Decision (For Now)

Stick with option_x_1.md Structure

Rationale

Design Principles to Follow

When to Revisit

Questions I'm Still Pondering

1. Should Serialization Live in SDK?

2. What About Client Pooling?

3. Naming: "Connector" vs "Integration"?

Conclusion

Uh oh!

numinnex Oct 16, 2025 Collaborator

chiradip
Oct 3, 2025

chiradip
Oct 16, 2025
Author

numinnex
Oct 16, 2025
Collaborator