Skip to content

How rails sharding connection handling works

Henrique Gubert edited this page Jun 16, 2017 · 13 revisions

First of all, to understand what the rails-sharding gem does, it is necessary to understand how ActiveRecord handles its connection to the database, and how we can use its existing interface to connect to shards.

ActiveRecord v5 defines the following class hierarchy:

ConnectionHandler -[has many]-> ConnectionPool -[has many]-> Connection (Adapter-specific classes)

Let's understand what each one of these classes do.

This is a high-level interface to database connection management. A common running Rails app has only one instance ConnectionHandler, that can be accessed via ActiveRecord::Base.connection_handler. The actual connection_handler getter is defined in ActiveRecord::Core#L134, which is included into ActiveRecord::Base.

The ConnectionHandler can manage several ConnectionPool, usually one per database. In a vanilla Rails app the ConnectionHandler has a single ConnectionPool. Besides holding several ConnectionPool, the ConnectionHandler also has the feature of automatically dealing with forked processes. If a Rails app is forked, the child (new) process cannot use the same ConnectionPool of the parent process. The ConnectionHandler detects the fork automatically and instanciates a new ConnectionPool for the child process, with the same specification as the parent's. This is completely transparent to the ConnectionHandler user.

To create a new ConnectionPool you need to call ConnectionHandler#establish_connection, which is a poorly named method for me, as it doesn't actually establish a connection to the database. It simply creates an empty ConnectionPool associated with a spec_name (just a plain string, which is "primary" by default).

Once the pool is created, you can retrieve it by spec_name using ConnectionHandler#retrieve_connection_pool, or you can even retrieve a DB connection directly just passing the spec_name using ConnectionHandler#retrieve_connection

Of course, Rails users don't usually know about the ConnectionHandler, and they also don't know things such as the spec_name of the database they're using. If you needed to get a connection or connection_pool through the ConnectionHandler, you'd call:

ActiveRecord::Base.connection_handler.retrieve_connection_pool("primary")
ActiveRecord::Base.connection_handler.retrieve_connection("primary")

However, Rails sugar-coats this process by including the ConnectionHandling module to ActiveRecord::Base, which makes accessing DB connections much simpler:

ActiveRecord::Base.connection_pool
ActiveRecord::Base.connection
ActiveRecord::Base.with_connection { |connection| ... }

Also, when you have different models being saved in different databases, the ConnectionHandling already does the trick for you of returning the connection or the connection pool to the right database, depending on which model you're calling the methods:

class ModelA < ActiveRecord::Base
  connection_specification_name :some_secondary_database
end

ModelA::Base.connection_pool
ActiveRecord::Base.connection
ActiveRecord::Base.with_connection { |connection| ... }

connection_specification_name

Clone this wiki locally