-
Notifications
You must be signed in to change notification settings - Fork 498
[FLINK-36696] [flink-autoscaler-plugin-jdbc] Switch sql connection usages to datasource #929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1996fanrui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sharath1709 for the contribution!
Overall LGTM! And I still have some comments:
- The first question is related to connection leak for testing. This PR will create lots of new dataSources, but the close isn't called. IIUC, it will have connection leak.
- The CI is failed, please take a look in your free time.
- We encountered a bug related to this fix recently. The database master is crashed, the slave is promoted to be the master. The jdbc state store and event handler won't work due to they are using the fixed connection.
- As I understand, DataSource could solve this problem as well.
- When old connection is crashed, DataSource will create a new physical connection to connect the new database master server.
- Would you mind adding a test to check it? In the ITCase, we only have one databse server, I'm not sure could it be reproduced when the database server is restarted.
| @AfterEach | ||
| void afterEach() throws SQLException { | ||
| if (conn != null) { | ||
| conn.close(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't close dataSource here?
As I understand, each test will create a new dataSource. The old dataSource will leak some connections if we don't close dataSource here.
| try (var conn = getConnection()) { | ||
| var jdbcStateInteractor = new JdbcStateInteractor(conn); | ||
| assertThat(jdbcStateInteractor.queryData(jobKey)).isEmpty(); | ||
| var dataSource = getDataSource(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's similar with the last comment, we didn't close the dataSource.
3dbc348 to
b4d3979
Compare
|
Thanks a lot @1996fanrui for the quick review. Please find my replies below
|
b4d3979 to
3fcb5f7
Compare
1996fanrui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sharath1709 , thanks for the update!
I left some comments, and the CI is still failed (It cannot be compiled for now. ).
You could try to execute mvn clean install -DskipTests in your local before next push, thanks~
.../java/org/apache/flink/autoscaler/jdbc/testutils/databases/postgres/PostgreSQLExtension.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/apache/flink/autoscaler/jdbc/testutils/databases/mysql/MySQLExtension.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/apache/flink/autoscaler/jdbc/testutils/databases/derby/DerbyExtension.java
Outdated
Show resolved
Hide resolved
|
|
||
| try { | ||
| conf.set(JDBC_URL, String.format("%s;shutdown=true", jdbcUrl)); | ||
| HikariJDBCUtil.getConnection(conf).close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, conf.set(JDBC_URL, String.format("%s;shutdown=true", jdbcUrl)) means shutdown the derby server. Don't we need to call it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I wasn't familiar with this and assumed it only closes the connection. We can continue to keep the same logic and additionally close the datasource as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both close connection and datasource should be wrapped inside of try to prevent test failure.
We can check the comment: database shutdown ignored exception, this catch means the test won't be failed if database shutdown throws exception.
.../src/test/java/org/apache/flink/autoscaler/standalone/AutoscalerEventHandlerFactoryTest.java
Show resolved
Hide resolved
...st/java/org/apache/flink/autoscaler/jdbc/event/AbstractJdbcAutoscalerEventHandlerITCase.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/apache/flink/autoscaler/jdbc/event/AbstractJdbcEventInteractorITCase.java
Show resolved
Hide resolved
.../src/test/java/org/apache/flink/autoscaler/jdbc/state/AbstractJdbcStateInteractorITCase.java
Show resolved
Hide resolved
...caler-plugin-jdbc/src/test/java/org/apache/flink/autoscaler/jdbc/state/JobStateViewTest.java
Show resolved
Hide resolved
350dfdf to
7d29111
Compare
1996fanrui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sharath1709 for the update, overall LGTM. I left 2 minor comments, please take a look in your free time, thanks~
We encountered a bug related to this fix recently. The database master is crashed, the slave is promoted to be the master. The jdbc state store and event handler won't work due to they are using the fixed connection.
- As I understand, DataSource could solve this problem as well.
- When old connection is crashed, DataSource will create a new physical connection > to connect the new database master server.
- Would you mind adding a test to check it? In the ITCase, we only have one databse server, I'm not sure could it be reproduced when the database server is restarted.
I have checked it manually on my Mac, it works well with this PR. Also, as we discussed before, it's better to add a test to check it.
| var dataSource = getDataSource(); | ||
| try (var conn = dataSource.getConnection(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| var dataSource = getDataSource(); | |
| try (var conn = dataSource.getConnection(); | |
| try (var dataSource = getDataSource(); | |
| var conn = dataSource.getConnection(); |
|
|
||
| try { | ||
| conf.set(JDBC_URL, String.format("%s;shutdown=true", jdbcUrl)); | ||
| HikariJDBCUtil.getConnection(conf).close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both close connection and datasource should be wrapped inside of try to prevent test failure.
We can check the comment: database shutdown ignored exception, this catch means the test won't be failed if database shutdown throws exception.
.../src/test/java/org/apache/flink/autoscaler/standalone/AutoscalerEventHandlerFactoryTest.java
Outdated
Show resolved
Hide resolved
7d29111 to
909cbb9
Compare
|
Thanks, @1996fanrui for your patience with the review. I addressed the remaining comments and added a test case for db restart to |
Sounds make sense to me, testing restart in AbstractJdbcStateInteractorITCase is enough, but the test is failed. Also, I may need to do another round of overall review after the test passes. |
|
@1996fanrui It appears that MySQL/Postgresql DB restart test cases are failing in CI with SQLTransientException. Let me update the test to ignore transient exception as typically subsequent connection should be successful. Before this PR, the test case produces SQLNonTransientConnectionException. We may also consider adding retries but I feel it's overkill. lmk your thoughts |
909cbb9 to
9adf5f6
Compare
After I debug it on my Mac, I don't think catch Also, I added the retry mechanism on my Mac, but it still doesn't work, I found after restart, the port of MySQL Container is changed. It means the original data source won't be used.
|
|
That sounds great, let me remove the test. I'm not the expert on this situation either and ChatGPT originally suggested starting and stopping the container to simulate a restart. I will try to explore it in my free time and add the test case later |
9adf5f6 to
a441077
Compare
|
One test fails since |
| public void afterEach(ExtensionContext extensionContext) throws Exception { | ||
| try (var conn = getConnection(); | ||
| try (var dataSource = getDataSource(); | ||
| var conn = getDataSource().getConnection(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| var conn = getDataSource().getConnection(); | |
| var conn = dataSource.getConnection(); |
@sharath1709 I found a bug here, it should use data source instead of getDataSource() here.
The second getDataSource() is not closed. That's why connection is leaked.
1996fanrui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sharath1709 for the patience!
LGTM. I will rebase master later, and merge it if the CI is green.
…ages to datasource
826f36a to
04a760a
Compare
04a760a to
de09534
Compare

What is the purpose of the change
This pull request makes changes to replace the usage of SQL connection with DataSource to improve the multithreaded performance of the JDBC plugin.
Brief change log
Verifying this change
This change is already covered by existing unit tests. The test cases have been modified appropriately to use DataSource rather than Connection everywhere
Does this pull request potentially affect one of the following parts:
CustomResourceDescriptors: noDocumentation