Replicating a FlinkSQL ETL Enrichment Pattern with Timeplus Proton #957

fsiddiqui-mdsol · 2025-08-30T02:19:12Z

fsiddiqui-mdsol
Aug 30, 2025

We have an established ETL enrichment pattern in our FlinkSQL environment that we're looking to replicate using Timeplus Proton. Our current Flink pipeline processes change data capture (CDC) events from multiple Kafka topics, manages state via RocksDB, and performs stream-to-stream joins to produce a unified output.

The FlinkSQL pattern involves:

Ingesting CDC Data: Three independent Kafka topics are consumed as Flink tables (table1, table2, table3.. tablen).

State Management & Deduplication: We use ROW_NUMBER() partitioned by the primary key to handle late and out-of-order events, effectively creating a deduplicated, up-to-date view of each table's state.

Stream Enrichment: We perform stream-to-stream joins on these deduplicated views (table1_dist, table2_dist, table3_dist,tablen_dist) based on their logical relationships (t1.id = t2.t1_id, t2.t3_id = t3.id).

Sinking to a Target Topic: The final enriched dataset is inserted into a target Kafka topic. some of the jobs produce data into upsert kafka connector with tombstone records.

Our goal is to understand how to implement this pattern efficiently in Timeplus Proton. Specifically, we'd like to know:

Q#1: Does Timeplus Proton offer a built-in mechanism for handling CDC and managing state, similar to Flink's RocksDB-backed state and ROW_NUMBER() over a DISTINCT or upsert view?

Q#2: What is the idiomatic Timeplus Proton approach to performing stream-to-stream joins for enrichment on multiple Kafka topics?

Q#3: In Flink, even if we add new column, it breaks the state hence required the job to start from beginning of kafka topic offsets with blackhole sink and after establishing a state, flink the savepoint from original job to another job. How do we handle this in timeplus?

Q#4: We have savepoint configured to run few times a day, in case of infra upgrades and patching these savepoints are used. Maintain only few days of savepoint on s3. What is pattern in proton?

Q#4: We have 300+ clients data stream into kafka topics, that goes through series of joins and business logic source from 15 tables/topics with 59 vertices on flink dag. for any change in master tables, entire transactional data is recomputed and cause backpressure. This in result, timeout checkpoints/savepoint and eventually restarts the job. How is it handled in proton?

Q#5: Intermediate state are nightmare to crack and upstream data in kafka topic is not guaranteed in sequence. for example client1 data changes on T1, client2 data changes on T2 but its possible that client2 data produced ot kafka earlier than client 1 data
hence cdc_timestamp is not always the largest on largest offset. This requires to disable watermark in flink jobs. What is the pattern in proton for such data behavior.
link 2.1 released some sql to view data from state but figuring out the name of operqto from flinksql is still challenging)

Q#6: Because of large state and joins, we have to provide minimum 30gb to managed memory (rocksdb) out of total 40gb of taskmanager with 15 core of cpu in our pods with local SSD. how does different memory allocation and tuning handled in proton?

Q#7: How does observability is done on proton based materailized view like back-pressure, state size, checkpoint/savepoints failures, thoughput per operator.

Q#8: How do we do UDF in proton? Can we register existing jar file and expose function in proton ? does it impact performance

Q#9: Does proton support kafka-upsert connector for sink to represent insert/update/delete ??

Any guidance on how Proton handles these complexities would be greatly appreciated.

chenziliang · 2025-08-30T06:38:51Z

chenziliang
Aug 30, 2025
Maintainer

Hi @fsiddiqui-mdsol,

Thank you for your interest in Timeplus Proton.

CDC processing and join is a typical use case in Timeplus Proton as well. Here are some brief answers to your questions above. Happy to discuss more via slack chat or hop on a quick Zoom chat.

Q#1: Does Timeplus Proton offer a built-in mechanism for handling CDC and managing state, similar to Flink's RocksDB-backed state and ROW_NUMBER() over a DISTINCT or upsert view?

Timeplus Proton for now supports a local file system as state checkpoint which has better perf usually than RocksDB based ckpt since it is our own format. Having said this, Timeplus Enterprise indeed supports RocksDB based ckpt where incremental ckpt shines. BTW, we are implementing Timeplus Proton 2.0 in the next month to which we will backport lots of enterprise features and Timeplus Community users will have both file system based ckpt and rocks-db based (incremental) ckpt.

Q#2: What is the idiomatic Timeplus Proton approach to performing stream-to-stream joins for enrichment on multiple Kafka topics?

Timeplus Proton supports several dozens of stream join stream modes https://docs.timeplus.com/joins. For your cases specifically, you probably like multi-way versioned-kv streams join to support data revision / mutation via primary key. The join is like regular database table join, but in a streaming / incremental way (events on every side of the stream trigger the join), it is pretty straight forward for this case specifically : users don't need worry about watermark. In Timeplus Enterprise on the other hand, we have introduced Mutable stream to handle this case which usually alberts better efficiency and perf.

The workflow can be: Kafka CDC topic -> Timeplus Proton Kafka external stream -> Timeplus Proton versioned-kv stream -> Multi-way stream join -> downstream Timeplus stream or Kafka topic or other database

For other kind of join (no data mutation) user can join Kafka topic stream directly like Flink does but probably have more join modes supports.

Q#3: In Flink, even if we add new column, it breaks the state hence required the job to start from beginning of kafka topic offsets with blackhole sink and after establishing a state, flink the savepoint from original job to another job. How do we handle this in timeplus?

This is a typical challenging problem. In Timeplus Proton, we haven't tackled this yet, but in Timeplus Enterprise, we support adding new columns, aggregations to an existing Materialized View (like streaming job in Flink) with smoothy transition (no need redo everything from scratch). Once again, we are backporting these enterprise enhancements to Proton 2.0 as well.

Q#4: We have savepoint configured to run few times a day, in case of infra upgrades and patching these savepoints are used. Maintain only few days of savepoint on s3. What is pattern in proton?

Timeplus does similar things: periodically does the checkpoint which can be full ckpt or incremental periodically (configurable per MaterializedView) . Full ckpt in Timeplus is like savepoints. In Timeplus Enterprise, we support S3 based ckpt, Local file system based ckpt, NativeLog based ckpt and a combo of some of these for different scenarios. We are back porting quite a few these enhancements to Proton 2.0 as well.

Q#5: We have 300+ clients data stream into kafka topics, that goes through series of joins and business logic source from 15 tables/topics with 59 vertices on flink dag. for any change in master tables, entire transactional data is recomputed and cause backpressure. This in result, timeout checkpoints/savepoint and eventually restarts the job. How is it handled in proton?

Timeplus shall have way better performance and efficiency and we can explore the possibility of the incremental computation, sharding and parallelization of the join. Depending on if we can use incremental ckpt, the size of ckpt could be different largely. Will need understand more about your use cases and the type of processing : join, aggregation, how many streams, eps, cardinality etc. Timeplus Enterprise which provides clustering support and can distributed the workload to multi-nodes, it could be another viable solution.

Q#6: Intermediate state are nightmare to crack and upstream data in kafka topic is not guaranteed in sequence. for example client1 data changes on T1, client2 data changes on T2 but its possible that client2 data produced ot kafka earlier than client 1 data
hence cdc_timestamp is not always the largest on largest offset. This requires to disable watermark in flink jobs. What is the pattern in proton for such data behavior.
link 2.1 released some sql to view data from state but figuring out the name of operqto from flinksql is still challenging)

As mentioned in Q#2, if we are doing versioned-kv or mutable streams join, out of order shall be fine for the join since Timeplus doesn't care about the watermark, the final join results will converge to the regular database table join.

Q#7: Because of large state and joins, we have to provide minimum 30gb to managed memory (rocksdb) out of total 40gb of taskmanager with 15 core of cpu in our pods with local SSD. how does different memory allocation and tuning handled in proton?

Timeplus Enterprise supports hybrid hash join and hybrid aggregation, which basically spill cold keys / rows to disk to trade-off between perf / latency and resource utilization. Some of our customer leverage this to reduce total memory consumption significantly https://www.timeplus.com/post/customer-story-zyre. Timeplus Proton doesn't have these enhancements for now, and we are backporting them to Proton 2.0.

Q#8: How does observability is done on proton based materailized view like back-pressure, state size, checkpoint/savepoints failures, thoughput per operator.

Timeplus Enterprise Console has almost everything you mentioned here regarding lagging, state size, MaterializedView last error at high level. Users also can drill down to the DAG to check the status of each operator. These features are only in Enterprise offering

Here are some screenshots

Q9: How do we do UDF in proton? Can we register existing jar file and expose function in proton ? does it impact performance>

Timeplus Proton is C++ engine, it doesn't support jar. But it has native JavaScript UDF / UDAF support. Timepus Enterprise supports Python UDF / UDAF with 3rd party python libs support. Python UDF will be backported to Proton OSS 2.0 as well.

UDF always have some perf impact since it cross language boundry although Timeplus natively embeds JavaScript v8 engine, CPython interpreter in the engine itself. But our testing shows, the performance is very good and can serve most of the use cases. The other option is using SQL UDF which usually has better perf.

Q#10: Does proton support kafka-upsert connector for sink to represent insert/update/delete ??

Timeplus mutable Stream, versioned-kv stream, changelog-kv are designed specifically for this use cases.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replicating a FlinkSQL ETL Enrichment Pattern with Timeplus Proton #957

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Replicating a FlinkSQL ETL Enrichment Pattern with Timeplus Proton #957

Uh oh!

Uh oh!

fsiddiqui-mdsol Aug 30, 2025

Replies: 1 comment

Uh oh!

chenziliang Aug 30, 2025 Maintainer

fsiddiqui-mdsol
Aug 30, 2025

chenziliang
Aug 30, 2025
Maintainer