Skip to content

Conversation

@aleksandarskrbic
Copy link

@aleksandarskrbic aleksandarskrbic commented Nov 19, 2025

What kind of change does this PR introduce?

Fixes #30

What is the new behavior?

New Endpoints

Created new etl-validation crate with validation logic separated from API concerns:

  • SourceValidator trait for PostgreSQL validation
  • DestinationValidator trait for BigQuery and Iceberg validation
  • Proper error types distinguishing connection failures from configuration issues

Another case for etl-validation is that we can reuse validation logic before even pipeline starts, because with current implementation this is were all potential source/destination errors will appear

POST /v1/sources/test-connection

  • Validates PostgreSQL connection configuration
  • Checks database version (14, 15, 16, 17 supported)
  • Verifies WAL level is set to 'logical'
  • Confirms user has replication privileges
  • Ensures replication slots are available
  • Returns 200 OK if valid, 500 with error details if not

POST /v1/destinations/test-connection

  • Validates BigQuery service account credentials
  • Tests Iceberg REST catalog connectivity
  • Verifies Iceberg Supabase catalog access
  • Tests S3 endpoint accessibility
  • Returns 200 OK if valid, 500 with error details if not

Additional context

Test Infrastructure Improvements

  • Fixed TestApp Drop implementation to handle cleanup errors gracefully instead of panicking
  • Changed drop_pg_database to log warnings instead of panicking on connection failures
  • Added panic safety with catch_unwind to prevent cleanup failures from crashing test suite
  • Added graceful shutdown delay to allow server connections to close before database deletion

@aleksandarskrbic aleksandarskrbic requested a review from a team as a code owner November 19, 2025 16:29
@aleksandarskrbic aleksandarskrbic changed the title Add Connection Validation Endpoints for Sources and Destinations feat(etl-api, etl-validation): connection validation endpoints Nov 19, 2025
@iambriccardo
Copy link
Contributor

@codex review

@iambriccardo
Copy link
Contributor

Thanks for your contribution, given our production setup, this validation is good but not bulletproof. I am not sure we want to get this much code into the project as of now, but if we will see the need, we will definitely come back to this PR!

I will use your ideas and improvements to the test harnesses though, those are nice improvements and thanks a lot for them!

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +306 to +309
pg_config
.validate()
.await
.map_err(|e| SourcesDbError::Database(sqlx::Error::Configuration(Box::new(e))))?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return source validation failures as client errors

Validation failures from pg_config.validate() are wrapped into SourcesDbError::Database(...) here, which SourceError::status_code returns as HTTP 500 with the generic "internal server error" body. The OpenAPI annotation for this endpoint advertises 400 for failed connection tests, and the new ValidationError types distinguish misconfiguration from server faults. With the current mapping, bad credentials, non‑logical wal_level, or missing replication slots will be reported as 500s with no actionable message, preventing clients from telling user errors apart from real server failures.

Useful? React with 👍 / 👎.

Comment on lines +305 to +309
destination_config.validate().await.map_err(|e| {
DestinationsDbError::Database(sqlx::Error::Configuration(Box::new(std::io::Error::new(
std::io::ErrorKind::Other,
e,
))))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Destination validation errors mapped to 500 internal error

Destination connection validation errors are converted into DestinationsDbError::Database(sqlx::Error::Configuration(..)), which DestinationError renders as HTTP 500 with the generic "internal server error" message. The route’s OpenAPI docs include a 400 response for connection test failures, and ValidationError provides specific, user‑actionable reasons (e.g., bad BigQuery key, unreachable Iceberg catalog). Currently all such user errors surface as 500s without details, making clients unable to tell misconfiguration from server faults.

Useful? React with 👍 / 👎.

@aleksandarskrbic
Copy link
Author

Thanks for your contribution, given our production setup, this validation is good but not bulletproof. I am not sure we want to get this much code into the project as of now, but if we will see the need, we will definitely come back to this PR!

I will use your ideas and improvements to the test harnesses though, those are nice improvements and thanks a lot for them!

@iambriccardo Thanks for the feedback!
If the current implemetation isn't sufficient could you clarify what additional checks would be required?
I am eager to do it, if I know what would be better.

If the issue scope has changed or is no longer needed, should we close issue #30?

@iambriccardo
Copy link
Contributor

Thanks for your contribution, given our production setup, this validation is good but not bulletproof. I am not sure we want to get this much code into the project as of now, but if we will see the need, we will definitely come back to this PR!
I will use your ideas and improvements to the test harnesses though, those are nice improvements and thanks a lot for them!

@iambriccardo Thanks for the feedback! If the current implemetation isn't sufficient could you clarify what additional checks would be required? I am eager to do it, if I know what would be better.

If the issue scope has changed or is no longer needed, should we close issue #30?

Your implementation is perfectly fine. The main issue I was referring to is that our replicator instances are running in the same network as the API (which performs the validation). However, we should not make any assumptions about that. For example, the API might reach the systems while the replicators cannot for some other reason. This is why the validation is helpful from a UX standpoint, but allowing the pipeline to fail and report its own problem is also a valid solution. It results in slightly worse UX but requires no code changes.

@aleksandarskrbic
Copy link
Author

Thanks for your contribution, given our production setup, this validation is good but not bulletproof. I am not sure we want to get this much code into the project as of now, but if we will see the need, we will definitely come back to this PR!
I will use your ideas and improvements to the test harnesses though, those are nice improvements and thanks a lot for them!

@iambriccardo Thanks for the feedback! If the current implemetation isn't sufficient could you clarify what additional checks would be required? I am eager to do it, if I know what would be better.
If the issue scope has changed or is no longer needed, should we close issue #30?

Your implementation is perfectly fine. The main issue I was referring to is that our replicator instances are running in the same network as the API (which performs the validation). However, we should not make any assumptions about that. For example, the API might reach the systems while the replicators cannot for some other reason. This is why the validation is helpful from a UX standpoint, but allowing the pipeline to fail and report its own problem is also a valid solution. It results in slightly worse UX but requires no code changes.

@iambriccardo
Thanks for the exaplanation, now it's 100% clear to me why this is not a perfect solution.
A proper solution would probably require testing from the replicator's network context (spinning up test pods), which would add significant complexity, and that can't be justified by having a little bit better UX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add API endpoints to test connection details for sources and sinks

2 participants