Skip to content

Conversation

@Umang01-hash
Copy link
Member

@Umang01-hash Umang01-hash commented Aug 7, 2025

Pull Request Template

Description:

What is DBResolver?

  • Adds a DBResolver module to GoFr, which provides automatic read/write splitting for SQL databases.

  • Read queries (e.g., SELECT) are routed to read replicas, and write queries (e.g., INSERT, UPDATE) are routed to the primary database.

  • Seamlessly wraps the existing SQL datasource: does not require any application code changes for existing queries.

  • Developers interact with c.SQL exactly as before; all routing and failover are fully transparent.

Motivation & Benefits

  • Scalable horizontal read performance with multiple replicas
  • Reduced load on primary database
  • Fault-tolerant: automatic fallback to primary if all replicas fail
  • Clean metrics and tracing support for operational visibility

Example Usage:

import (
    "gofr.dev/pkg/gofr"

    "gofr.dev/pkg/gofr/datasource/dbresolver"
)

func main() {
    a := gofr.New()

    // Enable DBResolver with round-robin strategy and fallback
    resolver := dbresolver.NewProvider("round-robin", true)
    a.AddDBResolver(resolver)

    // Continue as usual: all routes and SQL logic unchanged
    a.GET("/db/read", DBReadHandler)
    a.POST("/db/write", DBWriteHandler)
    a.Run()
}

Configuration Example:

DB_HOST=localhost
DB_USER=root
DB_PASSWORD=rootpassword
DB_NAME=testdb
DB_PORT=3306
DB_DIALECT=mysql
DB_MAX_IDLE_CONNECTION=2
DB_MAX_OPEN_CONNECTION=0

# Replica hosts (comma-separated, e.g., on ports 3307 and 3308)
DB_REPLICA_HOSTS=localhost:3307,localhost:3308

Testing Strategy:

  • Primary and Replicas launched with docker-compose (primary:3306, replicas:3307/3308)

  • Replication automated by setup scripts; seed data from SQL dump and app endpoints

  • Load Testing: Performed with Apache JMeter simulating high-concurrency API requests (both reads and writes)

Results:

  • Zero error rate; throughput and latency showed no regression compared to the baseline.

  • Read/write split is fully performant, and scaling is achieved without application change.

Checklist:

  • I have formatted my code using goimport and golangci-lint.
  • All new code is covered by unit tests.
  • This PR does not decrease the overall code coverage.
  • I have reviewed the code comments and documentation for clarity.

Thank you for your contribution!

Umang01-hash and others added 17 commits July 28, 2025 16:36
Bumps [go.opentelemetry.io/otel/exporters/prometheus](https://github.com/open-telemetry/opentelemetry-go) from 0.59.0 to 0.59.1.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@exporters/prometheus/v0.59.0...exporters/prometheus/v0.59.1)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/prometheus
  dependency-version: 0.59.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.243.0 to 0.244.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](googleapis/google-api-go-client@v0.243.0...v0.244.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-version: 0.244.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [gofr.dev](https://github.com/gofr-dev/gofr) from 1.42.4 to 1.42.5.
- [Release notes](https://github.com/gofr-dev/gofr/releases)
- [Commits](v1.42.4...v1.42.5)

---
updated-dependencies:
- dependency-name: gofr.dev
  dependency-version: 1.42.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
@gizmo-rt
Copy link
Contributor

@Umang01-hash Can you add more details on sequence of R(Read) and W(Write) cases under which read and write replicas would be selected ?

@Umang01-hash
Copy link
Member Author

@Umang01-hash Can you add more details on sequence of R(Read) and W(Write) cases under which read and write replicas would be selected ?

Hey @gizmo-rt we first determine is a query is read/write and if it is a read query we check if healthyReplica is available or not. If the replica is available we select it and send the read query to it and if the replica is not available we send it to primary
database. Methods like QueryContext, Select have this logic. Methods like Exec , Prepare etc are directly using the primary db. Transaction methods (Begin, BeginTx) are always routed to the primary.

If all replicas are unhealthy at time of a read query and fallback is enabled, the read falls back to the primary. If fallback is disabled, the read fails with an error.

If a replica experiences multiple failures, it's circuit breaker opens and replica is temporarily skipped for queries. This timeout period is 30 seconds by default and we allow 5 failures for replica before cicuit breaker is opened.

For an example sequence like R,R,W,R:

1st R: Routed to replica

2nd R: Routed to next replica (depends on which strategy is choosen random or round-robin)

1st W: Routed to primary

3rd R: Routed to next replica

@gofr-dev gofr-dev deleted a comment from coolwednesday Aug 29, 2025
@Umang01-hash
Copy link
Member Author

This PR implements SQL-level routing (parsing queries) to split reads vs writes. Another approach is to use the HTTP method (e.g. GET=read, POST/PUT=write) to choose the DB. Given cases where a handler might do SELECT then UPDATE, or a GET-route that writes, which strategy do people prefer? Any thoughts on pros/cons?

@aryanmehrotra @ccoVeille @akshat-kumar-singhal any suggestions on it??

@akshat-kumar-singhal
Copy link
Contributor

This PR implements SQL-level routing (parsing queries) to split reads vs writes. Another approach is to use the HTTP method (e.g. GET=read, POST/PUT=write) to choose the DB. Given cases where a handler might do SELECT then UPDATE, or a GET-route that writes, which strategy do people prefer? Any thoughts on pros/cons?

@aryanmehrotra @ccoVeille @akshat-kumar-singhal any suggestions on it??

@Umang01-hash Any issues you see with the current implementation of identifying it based on the query?
Another way to think of this implementation is by drawing a parallel to cache. Given that we'd want to limit the visibility of cache/read DB within the store layer, the store layer should be able to decide whether to use primary or replica/cache based on the operation (SELECT/UPDATE etc) and transaction.

@aryanmehrotra
Copy link
Member

This PR implements SQL-level routing (parsing queries) to split reads vs writes. Another approach is to use the HTTP method (e.g. GET=read, POST/PUT=write) to choose the DB. Given cases where a handler might do SELECT then UPDATE, or a GET-route that writes, which strategy do people prefer? Any thoughts on pros/cons?
@aryanmehrotra @ccoVeille @akshat-kumar-singhal any suggestions on it??

@Umang01-hash Any issues you see with the current implementation of identifying it based on the query? Another way to think of this implementation is by drawing a parallel to cache. Given that we'd want to limit the visibility of cache/read DB within the store layer, the store layer should be able to decide whether to use primary or replica/cache based on the operation (SELECT/UPDATE etc) and transaction.

One issue which might occur is if there is a POST request and user is first inserting the value and then fetching it.
based on the query pattern which we have currently followed there are good chances that while fetching we will not get the resource, so the correct approach seems to be based on the request method type rather than query type.

@akshat-kumar-singhal
Copy link
Contributor

One issue which might occur is if there is a POST request and user is first inserting the value and then fetching it.
based on the query pattern which we have currently followed there are good chances that while fetching we will not get the resource, so the correct approach seems to be based on the request method type rather than query type.

@aryanmehrotra While selecting the primary/replica based on request method type feels safer, we may still have issues due to replication lag. Also, in case of poor implementation (by application dev), example - DB writes in GET call, we can have other issues.
We may want to consider supporting a config - list of end points (maybe) which would operate on the replica instead of primary. This would provide more control to the developers and heavier read operations can be safely routed to the replica.

@aryanmehrotra
Copy link
Member

One issue which might occur is if there is a POST request and user is first inserting the value and then fetching it.
based on the query pattern which we have currently followed there are good chances that while fetching we will not get the resource, so the correct approach seems to be based on the request method type rather than query type.

@aryanmehrotra While selecting the primary/replica based on request method type feels safer, we may still have issues due to replication lag. Also, in case of poor implementation (by application dev), example - DB writes in GET call, we can have other issues. We may want to consider supporting a config - list of end points (maybe) which would operate on the replica instead of primary. This would provide more control to the developers and heavier read operations can be safely routed to the replica.

We should document poor practices like DB writes in GET calls, since features like status code handling are based on request type and overriding them breaks best practices.

For routing queries, we could either:

Support a config with endpoints that always use replicas for heavy reads, or
Let users specify which queries run on which DB.

This covers more use cases, but ideally we want a cleaner approach where devs don’t have to manage this manually. If not, either of the options works.

@Umang01-hash
Copy link
Member Author

Umang01-hash commented Oct 28, 2025

Hey! Here are my sugegstions for the above problems/comments:

  1. Transaction Scoping: When developers use Begin() → Commit() (database transactions), all queries inside that block would automatically go to the primary database. This would solve the POST → INSERT → SELECT pattern because both operations happen on the same database.

  2. Read-After-Write Consistency: We could track when any write happens and for the next few seconds, automatically route all read queries to primary instead of replicas. This would handle cases where you INSERT data and immediately try to SELECT it outside of transactions.

  3. Why SQL-Based Is Still Better: HTTP methods (GET/POST) don't tell us what the actual SQL query does. A GET endpoint might cache data (write), or a POST might only validate data (read). Our SQL-based approach looks at the actual database operation (SELECT vs INSERT) rather than guessing from the HTTP method.

Question to all: Does the above approach sound reasonable? Would these two additions solve the consistency concerns while keeping the implementation simple? Happy to explore HTTP method-based routing if you think it's a better path forward.

@aryanmehrotra @ccoVeille @akshat-kumar-singhal @gizmo-rt - thoughts?

@aryanmehrotra
Copy link
Member

aryanmehrotra commented Oct 28, 2025

Hey! Here are my sugegstions for the above problems/comments:

  1. Transaction Scoping: When developers use Begin() → Commit() (database transactions), all queries inside that block would automatically go to the primary database. This would solve the POST → INSERT → SELECT pattern because both operations happen on the same database.

I highly doubt if someone in a transaction would do POST → INSERT → SELECT, as mostly we do insert/update in transactions, SELECT happens separetely.

Also, if we think about the three layer architecture, there are different method for creating an entry and getting that entry, right now we can't do that in a single transaction in gofr.

  1. Read-After-Write Consistency: We could track when any write happens and for the next few seconds, automatically route all read queries to primary instead of replicas. This would handle cases where you INSERT data and immediately try to SELECT it outside of transactions.

It would be very expensive, imagine writes or updates happening from a cron-job every 15 minutes then mostly it would be happening in a particular table, then we would also need to be table aware in that case. otherwise even after having the read/write split most queries would go to the write replica.

  1. Why SQL-Based Is Still Better: HTTP methods (GET/POST) don't tell us what the actual SQL query does. A GET endpoint might cache data (write), or a POST might only validate data (read). Our SQL-based approach looks at the actual database operation (SELECT vs INSERT) rather than guessing from the HTTP method.

What does the best-practise say about GET and POST request endpoints functioning and what we already follow in gofr?

@Umang01-hash
Copy link
Member Author

Umang01-hash commented Oct 28, 2025

@aryanmehrotra Thanks for your suggestions. Your points seem valid - transaction scoping wouldn't work in three-layer architecture, and global read-after-write tracking would be too expensive.

Request-Scoped Routing: Instead of query-level or global tracking, we can route all queries within a single HTTP request based on the HTTP method. So if someone hits POST /users, ALL database queries (both INSERT and SELECT) within that request handler automatically go to primary. GET requests continue using replicas.

This solves the POST → INSERT → SELECT consistency issue and aligns with REST best practices - GET should be safe (no side effects) and POST/PUT for state changes. GoFr already follows these patterns in its routing and status code handling. Thoughts on this approach?​

An extension on this as suggested by @akshat-kumar-singhal is:
Route-Level Configuration: We can also provide optional endpoint-level control where developers can configure specific routes to always use primary:

# Optional: Force specific endpoints to always use primary
DB_PRIMARY_ROUTES=/users/search,/reports/heavy,/admin/*

@akshat-kumar-singhal
Copy link
Contributor

akshat-kumar-singhal commented Oct 28, 2025 via email

@aryanmehrotra
Copy link
Member

aryanmehrotra commented Oct 28, 2025

@aryanmehrotra Thanks for your suggestions. Your points seem valid - transaction scoping wouldn't work in three-layer architecture, and global read-after-write tracking would be too expensive.

Request-Scoped Routing: Instead of query-level or global tracking, we can route all queries within a single HTTP request based on the HTTP method. So if someone hits POST /users, ALL database queries (both INSERT and SELECT) within that request handler automatically go to primary. GET requests continue using replicas.

This solves the POST → INSERT → SELECT consistency issue and aligns with REST best practices - GET should be safe (no side effects) and POST/PUT for state changes. GoFr already follows these patterns in its routing and status code handling. Thoughts on this approach?​

An extension on this as suggested by @akshat-kumar-singhal is: Route-Level Configuration: We can also provide optional endpoint-level control where developers can configure specific routes to always use primary:

# Optional: Force specific endpoints to always use primary
DB_PRIMARY_ROUTES=/users/search,/reports/heavy,/admin/*

Yes this seems better, also env variable shouldn't be used for this we should give option from the code itself, this can be part as an enhancement based on user feedback once we rollout the feature just based on the route level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants