-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](StreamingJob) fix postgres consumer data in multi backend #59798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 31764 ms |
TPC-DS: Total hot run time: 172827 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run cloud_p0 |
FE Regression Coverage ReportIncrement line coverage |
# Conflicts: # fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingMultiTblTask.java
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 31642 ms |
TPC-DS: Total hot run time: 171994 ms |
|
run p0 |
|
run buildall |
TPC-H: Total hot run time: 32020 ms |
TPC-DS: Total hot run time: 172939 ms |
FE UT Coverage ReportIncrement line coverage |
|
run nonConcurrent |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes PostgreSQL consumer data handling in multi-backend scenarios by ensuring proper slot management and connection cleanup. The changes address the issue where PostgreSQL slots can only be used by one client at a time, necessitating proper initialization and cleanup strategies.
Changes:
- Added early PostgreSQL slot creation during job initialization to prevent conflicts in multi-backend scenarios
- Moved connection cleanup from mid-processing to the
finishSplitRecordsmethod to properly release connections after each read cycle - Refactored tests to use proper assertions and polling mechanisms instead of fixed sleep delays
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
PostgresSourceReader.java |
Added log message when replication slot already exists |
MySqlSourceReader.java |
Moved binlog reader cleanup from mid-processing to finishSplitRecords method |
JdbcIncrementalSourceReader.java |
Moved binlog reader cleanup from mid-processing to finishSplitRecords method |
ClientController.java |
Added new /api/initReader endpoint for early source reader initialization |
JdbcSourceOffsetProvider.java |
Added initSourceReader method to initialize readers for latest mode scenarios |
StreamingMultiTblTask.java |
Enhanced logging to include backend information for better debugging |
test_streaming_postgres_job_priv.groovy |
Refactored test to expect early failure without replication privileges and use polling instead of sleep |
test_streaming_mysql_job_priv.groovy |
Replaced fixed sleep with polling mechanism to wait for task completion |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcSourceOffsetProvider.java
Show resolved
Hide resolved
...ient/src/main/java/org/apache/doris/cdcclient/source/reader/JdbcIncrementalSourceReader.java
Show resolved
Hide resolved
...c_client/src/main/java/org/apache/doris/cdcclient/source/reader/mysql/MySqlSourceReader.java
Show resolved
Hide resolved
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_priv.groovy
Show resolved
Hide resolved
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_postgres_job_priv.groovy
Show resolved
Hide resolved
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_postgres_job_priv.groovy
Show resolved
Hide resolved
caoliang-web
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### What problem does this PR solve? Related PR: #59461 1. PostgreSQL uses slots for data consumption, but only one client can use a slot at a time. Therefore, after consuming data from the WAL phase, the slot needs to be closed. This doesn't affect MySQL, but it can be closed to avoid consuming connections. 2. Create pg slot first when create job 3. fix unstable case
…ackend #59798 (#59841) Cherry-picked from #59798 Co-authored-by: wudi <[email protected]>
What problem does this PR solve?
Related PR: #59461
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)