Fix: Inconsistent table names in Flink Kafka Sinks. #287

PrinceSajjadHussain · 2025-06-20T14:50:27Z

The start_job.py file creates a Kafka sink named process_events_kafka, while aggregation_job.py creates a Kafka source with the same name process_events_kafka. This can cause confusion and potentially lead to the aggregation job reading from the sink it is writing to, which would be incorrect.

Fix:
--- a/bootcamp\materials\4-apache-flink-training\src\job\start_job.py
+++ b/bootcamp\materials\4-apache-flink-training\src\job\start_job.py
@@ -5,7 +5,7 @@
from pyflink.table import EnvironmentSettings, DataTypes, TableEnvironment, StreamTableEnvironment

def create_processed_events_sink_kafka(t_env):

table_name = "process_events_kafka"

table_name = "raw_events_kafka"
kafka_key = os.environ.get("KAFKA_WEB_TRAFFIC_KEY", "")
kafka_secret = os.environ.get("KAFKA_WEB_TRAFFIC_SECRET", "")
sasl_config = f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{kafka_key}" password="{kafka_secret}";'

EcZachly

The update resolves name conflicts in Flink Kafka sinks, essential for the correct reading/writing processes. Preventing such confusion is crucial for robust data processing pipelines.

Recommendation: Approve for Merge

EcZachly

Revoke previous approval due to changes needed.

EcZachly · 2025-08-06T02:44:15Z

bootcamp/materials/4-apache-flink-training/src/job/start_job.py

The fix made to resolve the inconsistent table names in Flink Kafka Sinks is concise and correctly addresses the issue where the same name was being used for different Kafka sinks and sources, potentially leading to data flow issues.

Code Quality

The change is clean and adheres to the current coding style, making it easy to understand and maintain.

Appropriate environment variable usage ensures that sensitive information such as Kafka keys and secrets are handled properly.

Review of Changes

Change: The table name was previously set to process_events_kafka and has been updated to raw_events_kafka. This change likely corrects a logical inconsistency to differentiate between processed and raw event data sink points.

Creator Fairness

The change made by PrinceSajjadHussain seems well thought-out, enhancing the robustness of data flow and organization within the project.

Recommendation

I recommend merging this pull request. The alteration resolves the described issue and aligns with best practices for managing Kafka topics, which aids in preventing any accidental data overlaps or misrouting in Flink jobs.

PrinceSajjadHussain and others added 2 commits June 20, 2025 07:49

Fix: bootcamp\materials\4-apache-flink-training\src\job\start_job.py

37666be

Update start_job.py

8d386fe

EcZachly approved these changes Aug 6, 2025

View reviewed changes

EcZachly requested changes Aug 6, 2025

View reviewed changes

EcZachly approved these changes Aug 6, 2025

View reviewed changes

isangwanrahul merged commit d4d71b9 into DataExpert-io:main Aug 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Inconsistent table names in Flink Kafka Sinks. #287

Fix: Inconsistent table names in Flink Kafka Sinks. #287

PrinceSajjadHussain commented Jun 20, 2025

Uh oh!

EcZachly left a comment

Uh oh!

EcZachly left a comment

Uh oh!

EcZachly Aug 6, 2025

Uh oh!

Uh oh!

Fix: Inconsistent table names in Flink Kafka Sinks. #287

Fix: Inconsistent table names in Flink Kafka Sinks. #287

Conversation

PrinceSajjadHussain commented Jun 20, 2025

Uh oh!

EcZachly left a comment

Choose a reason for hiding this comment

Uh oh!

EcZachly left a comment

Choose a reason for hiding this comment

Uh oh!

EcZachly Aug 6, 2025

Choose a reason for hiding this comment

Code Quality

Review of Changes

Creator Fairness

Recommendation

Uh oh!

Uh oh!